510 Commits

Author SHA1 Message Date
Dingyuan Wang
d0e68974bf improved doc for tmp_dir and cache_file 2015-06-26 22:24:20 +08:00
Dingyuan Wang
66fe17517d prevent moving across different filesystems at tempfile.mkstemp 2015-06-26 22:12:39 +08:00
Dingyuan Wang
be46ddef9a use shutil.move for all platforms in case of different filesystems 2015-06-26 21:52:53 +08:00
Sun Junyi
17652e764f Merge pull request #271 from gumblex/master
修复 cut_for_search;改善 pair 对象
2015-06-01 18:40:31 +08:00
Dingyuan Wang
ceb5c26be4 fix self.FREQ in cut_for_search; make pair object iterable 2015-06-01 14:36:38 +08:00
Sun Junyi
9f4d9376b0 Merge pull request #269 from gumblex/master
自定义字典允许指定词性同时省略词频
2015-05-24 19:56:51 +08:00
Dingyuan Wang
3b76328f2a allow ignoring word frequency while providing pos tag 2015-05-23 21:51:00 +08:00
Sun Junyi
3ec4c43788 Merge pull request #260 from gumblex/master
使用类包装全局函数
2015-05-11 10:24:49 +08:00
Dingyuan Wang
94840a734c wraps most globals in classes
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default

Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
2015-05-09 21:29:05 +08:00
Sun Junyi
e359d08964 Merge pull request #257 from gip0/gip0-patch-1
fixed an error in load_userdict()
2015-05-02 17:27:16 +08:00
Gilbert Liu
f6e57ab2ae fixed an error in load_userdict() 2015-05-01 12:52:28 -07:00
Sun Junyi
60f0028175 Merge pull request #252 from fukuball/master
更新 README
2015-04-28 22:42:40 +08:00
Fukuball Lin
e712a4de61 更新 README
增加结巴分词 PHP 版本相關資訊
2015-04-28 22:05:26 +08:00
fxsjy
29d2b838dc a minor version on pypi, which removes *.pyc 2015-04-17 19:35:12 +08:00
fxsjy
c07b7fef54 hot-fix version for pull request #248 2015-04-10 18:54:51 +08:00
Sun Junyi
753c1be49c Merge pull request #248 from wangbin/master
exlucde word fragments from FREQ in posseg.cut
2015-04-02 15:32:41 +08:00
Wang Bin
84ffa0d4bf exlucde word fragments from FREQ 2015-04-02 11:06:55 +08:00
Sun Junyi
885417aed1 Merge pull request #247 from gumblex/master
更新文档
v0.36
2015-03-21 17:05:05 +08:00
Dingyuan Wang
eeaab012bf update docs 2015-03-21 10:53:42 +08:00
fxsjy
89481cfd84 version update 0.36 2015-03-20 11:00:55 +08:00
Sun Junyi
59aa8b69b1 Merge pull request #246 from gumblex/master
增加自动词频
2015-03-16 10:10:53 +08:00
Dingyuan Wang
4fa2728fb6 update README about new features 2015-03-14 12:44:49 +08:00
Dingyuan Wang
4a552ca94f suggest word frequency, support passing str to add_word 2015-03-14 12:44:19 +08:00
Sun Junyi
1b4721ebb8 Merge pull request #179 from changyy/master
新增自訂 cache_file 產生的目錄位置,可支援 jieba 運行在 Read-Only File System,如: Embedded Linux、Google App Engine 和 Heroku 等
2015-02-28 10:05:52 +08:00
Yuan-Yi Chang
62433a3205 讓 jieba 可以自行指定 cache_file 產生的目錄位置,提供 jieba 在 Read-only file system 環境中運行
1.在呼叫 jieba.cut() 等相關動作前,先透過 jieba.tmp_dir 指定目錄位置
2.當應用環境為 Read-Only File System,可透過預先產生 cache_file 的機制,讓 jieba 正常運行
3.實際案例為 Google App Engine 和 Heroku,其中前者免費版僅 128MB 記憶體空間無法運行,後者免費環境有 512MB 可正常運行。發佈前,先在本地端產生 cache_file 後,連同 cache_file 一併發佈至 Google App Engine 或 Heroku 環境上即可使用。
2015-02-27 17:25:59 +08:00
Sun Junyi
4b4aff6d89 Merge pull request #242 from gumblex/master
textrank 细节问题;文档更新
2015-02-17 14:57:27 +08:00
Dingyuan Wang
f29430f49e details in textrank; update README 2015-02-16 21:25:55 +08:00
Sun Junyi
a4fb439070 Merge pull request #241 from sing1ee/master
improve some details from other commiters' adivces
2015-02-16 20:41:06 +08:00
zhangcheng
01b7f6efcf improve some details from other commiters' adivces 2015-02-16 20:35:45 +08:00
Sun Junyi
4e05cde07e Merge pull request #240 from sing1ee/master
build stable sort for graph iteration
2015-02-16 20:28:22 +08:00
zhangcheng
8b8c6c85d0 remove unusage import 2015-02-16 15:51:05 +08:00
zhangcheng
a6d1b2479e build stable sort for graph iteration, then we can get stable result and adatpe details for python 3~ 2015-02-16 15:49:10 +08:00
zhangcheng
1152db7736 build stable sort for graph iteration, then we can get stable result. 2015-02-16 15:46:36 +08:00
fxsjy
49657c976d make extract_tags behavior compatiable with previous version 2015-02-14 21:23:58 +08:00
fxsjy
abcaf3e475 fix bug: load_userdict 2015-02-14 19:56:38 +08:00
Jack
a06b7d388e fix bug in __main__.py 2015-02-12 14:08:39 +08:00
Sun Junyi
9ca5b69907 Merge pull request #238 from gumblex/master
use str.splitlines to avoid losing line breaks
2015-02-12 13:35:52 +08:00
Dingyuan Wang
f2b7183a71 use str.splitlines to avoid losing line breaks 2015-02-12 12:39:14 +08:00
Sun Junyi
b14eb329e3 Merge pull request #237 from gumblex/master
直接将前缀储存在词频字典里
2015-02-12 11:27:25 +08:00
Dingyuan Wang
872a7039f2 Merge branch 'master' of https://github.com/fxsjy/jieba 2015-02-12 10:33:56 +08:00
Dingyuan Wang
f808ea0ebb use only one dict to store words and prefixes 2015-02-12 10:31:52 +08:00
fxsjy
4d7b515801 Merge branch 'master' of https://github.com/fxsjy/jieba 2015-02-11 20:57:35 +08:00
fxsjy
5bfa43a781 fix test scripts 2015-02-11 20:46:48 +08:00
Dingyuan Wang
f3a53dd2da fix print() in tests 2015-02-11 20:45:55 +08:00
Sun Junyi
a229041e58 Merge pull request #234 from yanyiwu/patch-2
Update README.md
2015-02-11 18:48:47 +08:00
Yanyi Wu
5d321cbccd Update README.md 2015-02-11 17:37:32 +08:00
fxsjy
8cbb26a7b6 fix test_file.py 2015-02-11 16:47:57 +08:00
Sun Junyi
41b47b0593 Merge pull request #233 from gumblex/master
合并 jieba3k,兼容 Python 2/3
2015-02-11 15:44:22 +08:00
Dingyuan Wang
32a0e92a09 don't compile re every time; autopep8 2015-02-10 21:22:34 +08:00
Dingyuan Wang
22bcf8be7a Merge master and jieba3k, make the code Python 2/3 compatible 2015-02-10 20:54:55 +08:00