jerryday
e5e41a4aad
fix pair object in dict problem
2015-10-30 16:38:50 +08:00
jerryday
4f8ca83661
add a withFlag param in textrank
2015-10-30 15:40:41 +08:00
jerryday
26e339f8f7
add a withFlag param to extract_tags
2015-10-30 11:09:24 +08:00
fxsjy
cb414cb861
version update
2015-06-27 16:49:44 +08:00
Dingyuan Wang
66fe17517d
prevent moving across different filesystems at tempfile.mkstemp
2015-06-26 22:12:39 +08:00
Dingyuan Wang
be46ddef9a
use shutil.move for all platforms in case of different filesystems
2015-06-26 21:52:53 +08:00
Dingyuan Wang
ceb5c26be4
fix self.FREQ in cut_for_search; make pair object iterable
2015-06-01 14:36:38 +08:00
Dingyuan Wang
3b76328f2a
allow ignoring word frequency while providing pos tag
2015-05-23 21:51:00 +08:00
Dingyuan Wang
94840a734c
wraps most globals in classes
...
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default
Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
2015-05-09 21:29:05 +08:00
Gilbert Liu
f6e57ab2ae
fixed an error in load_userdict()
2015-05-01 12:52:28 -07:00
fxsjy
29d2b838dc
a minor version on pypi, which removes *.pyc
2015-04-17 19:35:12 +08:00
Wang Bin
84ffa0d4bf
exlucde word fragments from FREQ
2015-04-02 11:06:55 +08:00
fxsjy
89481cfd84
version update 0.36
2015-03-20 11:00:55 +08:00
Dingyuan Wang
4a552ca94f
suggest word frequency, support passing str to add_word
2015-03-14 12:44:19 +08:00
Yuan-Yi Chang
62433a3205
讓 jieba 可以自行指定 cache_file 產生的目錄位置,提供 jieba 在 Read-only file system 環境中運行
...
1.在呼叫 jieba.cut() 等相關動作前,先透過 jieba.tmp_dir 指定目錄位置
2.當應用環境為 Read-Only File System,可透過預先產生 cache_file 的機制,讓 jieba 正常運行
3.實際案例為 Google App Engine 和 Heroku,其中前者免費版僅 128MB 記憶體空間無法運行,後者免費環境有 512MB 可正常運行。發佈前,先在本地端產生 cache_file 後,連同 cache_file 一併發佈至 Google App Engine 或 Heroku 環境上即可使用。
2015-02-27 17:25:59 +08:00
Dingyuan Wang
f29430f49e
details in textrank; update README
2015-02-16 21:25:55 +08:00
zhangcheng
01b7f6efcf
improve some details from other commiters' adivces
2015-02-16 20:35:45 +08:00
zhangcheng
8b8c6c85d0
remove unusage import
2015-02-16 15:51:05 +08:00
zhangcheng
a6d1b2479e
build stable sort for graph iteration, then we can get stable result and adatpe details for python 3~
2015-02-16 15:49:10 +08:00
zhangcheng
1152db7736
build stable sort for graph iteration, then we can get stable result.
2015-02-16 15:46:36 +08:00
fxsjy
49657c976d
make extract_tags behavior compatiable with previous version
2015-02-14 21:23:58 +08:00
fxsjy
abcaf3e475
fix bug: load_userdict
2015-02-14 19:56:38 +08:00
Jack
a06b7d388e
fix bug in __main__.py
2015-02-12 14:08:39 +08:00
Dingyuan Wang
f2b7183a71
use str.splitlines to avoid losing line breaks
2015-02-12 12:39:14 +08:00
Dingyuan Wang
f808ea0ebb
use only one dict to store words and prefixes
2015-02-12 10:31:52 +08:00
Dingyuan Wang
32a0e92a09
don't compile re every time; autopep8
2015-02-10 21:22:34 +08:00
Dingyuan Wang
22bcf8be7a
Merge master and jieba3k, make the code Python 2/3 compatible
2015-02-10 20:54:55 +08:00
Dingyuan Wang
4197dfb8fa
store int directly in FREQ; small improvements
2015-02-09 16:26:00 +08:00
Dingyuan Wang
765fd6b7f0
store int directly in FREQ; small improvements
2015-02-09 16:14:12 +08:00
Dingyuan Wang
7bcb128f5f
fix textrank divided by zero; fix posseg.pair.__repr__
2014-12-20 00:12:42 +08:00
Lin
fea3aec6bd
Fix divided by zero issue in case of words are not found in dict.
2014-12-05 17:13:12 +08:00
Dingyuan Wang
c6b386f65b
update jieba3k
2014-11-29 16:06:20 +08:00
Dingyuan Wang
7b7c6955a9
complete the setup.py, fix #202 problem in posseg
2014-11-29 15:33:42 +08:00
Nomaka
9cb76dd8b9
Update __init__.py
...
calc的idx参数没用
2014-11-18 16:00:49 +08:00
walkskyer
a336e26403
为函数textrank增加参数allowPOS,并修改extract_tags的参数allowPOS与textrank保持一致。
2014-11-15 18:36:09 +08:00
walkskyer
bab5f362ba
将exstract_tags参数allowPOS转换为frozenset以减少查找时间。
2014-11-15 18:14:47 +08:00
fxsjy
447c1ded8c
fix problem for python3.2
2014-11-15 13:44:30 +08:00
walkskyer
d82d2c18df
为关键字提取函数增加词性过滤功能
2014-11-13 22:26:22 +08:00
fxsjy
315a411e52
version update
2014-11-13 10:43:43 +08:00
walkskyer
5571a0337a
修复stop words处理未考虑"\r"导致不能正常匹配的问题。
2014-11-12 22:33:27 +08:00
Dingyuan Wang
7a6caa0c3c
port extract_tags, etc to jieba3k; add auto2to3 script
2014-11-07 23:33:31 +08:00
Dingyuan Wang
751ff35eb5
improve extract_tags; unify extract_tags and testrank
2014-10-31 23:15:51 +08:00
Dingyuan Wang
e3f3dcccba
improve the loading and caching process
2014-10-31 21:56:09 +08:00
Dingyuan Wang
fd9f1f2c0e
update README, textrank, etc.
2014-10-25 14:23:37 +08:00
Dingyuan Wang
a6119cc995
add custom dictionary to __main__; update README; slightly optimize textrank
2014-10-25 12:59:36 +08:00
zhangcheng
6eb9f6149c
add a simple implementation of textrank
2014-10-24 21:15:54 +08:00
fxsjy
f5ca87e088
merge change of @fukuball
2014-10-23 15:59:08 +08:00
fxsjy
ba87fcb01f
remove trie, use prefix set instead
2014-10-20 14:08:09 +08:00
fxsjy
82bfffb6ed
version update to 0.34
2014-10-20 13:35:13 +08:00
Dingyuan Wang
bb1e6000c6
fix version; fix spaces at end of line
2014-10-19 10:57:46 +08:00