Dingyuan Wang
|
32a0e92a09
|
don't compile re every time; autopep8
|
2015-02-10 21:22:34 +08:00 |
|
Dingyuan Wang
|
22bcf8be7a
|
Merge master and jieba3k, make the code Python 2/3 compatible
|
2015-02-10 20:54:55 +08:00 |
|
Dingyuan Wang
|
4197dfb8fa
|
store int directly in FREQ; small improvements
|
2015-02-09 16:26:00 +08:00 |
|
Dingyuan Wang
|
765fd6b7f0
|
store int directly in FREQ; small improvements
|
2015-02-09 16:14:12 +08:00 |
|
Dingyuan Wang
|
7bcb128f5f
|
fix textrank divided by zero; fix posseg.pair.__repr__
|
2014-12-20 00:12:42 +08:00 |
|
Lin
|
fea3aec6bd
|
Fix divided by zero issue in case of words are not found in dict.
|
2014-12-05 17:13:12 +08:00 |
|
Dingyuan Wang
|
c6b386f65b
|
update jieba3k
|
2014-11-29 16:06:20 +08:00 |
|
Dingyuan Wang
|
7b7c6955a9
|
complete the setup.py, fix #202 problem in posseg
|
2014-11-29 15:33:42 +08:00 |
|
Nomaka
|
9cb76dd8b9
|
Update __init__.py
calc的idx参数没用
|
2014-11-18 16:00:49 +08:00 |
|
walkskyer
|
a336e26403
|
为函数textrank增加参数allowPOS,并修改extract_tags的参数allowPOS与textrank保持一致。
|
2014-11-15 18:36:09 +08:00 |
|
walkskyer
|
bab5f362ba
|
将exstract_tags参数allowPOS转换为frozenset以减少查找时间。
|
2014-11-15 18:14:47 +08:00 |
|
fxsjy
|
447c1ded8c
|
fix problem for python3.2
|
2014-11-15 13:44:30 +08:00 |
|
walkskyer
|
d82d2c18df
|
为关键字提取函数增加词性过滤功能
|
2014-11-13 22:26:22 +08:00 |
|
fxsjy
|
315a411e52
|
version update
|
2014-11-13 10:43:43 +08:00 |
|
walkskyer
|
5571a0337a
|
修复stop words处理未考虑"\r"导致不能正常匹配的问题。
|
2014-11-12 22:33:27 +08:00 |
|
Dingyuan Wang
|
7a6caa0c3c
|
port extract_tags, etc to jieba3k; add auto2to3 script
|
2014-11-07 23:33:31 +08:00 |
|
Dingyuan Wang
|
751ff35eb5
|
improve extract_tags; unify extract_tags and testrank
|
2014-10-31 23:15:51 +08:00 |
|
Dingyuan Wang
|
e3f3dcccba
|
improve the loading and caching process
|
2014-10-31 21:56:09 +08:00 |
|
Dingyuan Wang
|
fd9f1f2c0e
|
update README, textrank, etc.
|
2014-10-25 14:23:37 +08:00 |
|
Dingyuan Wang
|
a6119cc995
|
add custom dictionary to __main__; update README; slightly optimize textrank
|
2014-10-25 12:59:36 +08:00 |
|
zhangcheng
|
6eb9f6149c
|
add a simple implementation of textrank
|
2014-10-24 21:15:54 +08:00 |
|
fxsjy
|
f5ca87e088
|
merge change of @fukuball
|
2014-10-23 15:59:08 +08:00 |
|
fxsjy
|
ba87fcb01f
|
remove trie, use prefix set instead
|
2014-10-20 14:08:09 +08:00 |
|
fxsjy
|
82bfffb6ed
|
version update to 0.34
|
2014-10-20 13:35:13 +08:00 |
|
Dingyuan Wang
|
bb1e6000c6
|
fix version; fix spaces at end of line
|
2014-10-19 10:57:46 +08:00 |
|
Dingyuan Wang
|
14671d4feb
|
fix __main__.py
|
2014-10-19 10:41:09 +08:00 |
|
Dingyuan Wang
|
b367690eeb
|
use prefix dict instead of trie, add a command line interface, and a few small improvements
|
2014-10-19 10:32:23 +08:00 |
|
Dingyuan Wang
|
51df77831b
|
use prefix dict instead of trie, add a command line interface, and a few small improvements
|
2014-10-18 22:23:26 +08:00 |
|
fxsjy
|
eb98eb9248
|
fix performance problem of extrag_tags
|
2014-10-10 15:41:28 +08:00 |
|
keroro520
|
77b442fa88
|
fix issues (https://github.com/fxsjy/jieba/issues/125)
|
2014-09-12 13:42:05 +08:00 |
|
Dingyuan Wang
|
626b415152
|
fix dict.itervalues mistake
|
2014-09-07 19:21:13 +08:00 |
|
Dingyuan Wang
|
6a3f228c72
|
fix python3 stuff
|
2014-09-07 18:50:10 +08:00 |
|
Dingyuan Wang
|
b16cf0d63f
|
fix indent typo
|
2014-09-06 23:37:54 +08:00 |
|
Dingyuan Wang
|
6fad5fbb2c
|
update to v0.33
|
2014-09-06 23:28:47 +08:00 |
|
Fukuball Lin
|
b658ee69cb
|
讓 jieba 可以自行增加 stop words 語料庫
1. 增加範例 stop words 語料庫
2. 為了讓 jieba 可以切換 stop words 語料庫,新增 set_stop_words 方法,並改寫 extract_tags
3. test 增加 extract_tags_stop_words.py 測試範例
|
2014-08-06 03:35:16 +08:00 |
|
Fukuball Lin
|
7198d562f1
|
讓 jieba 可以切換 idf 語料庫
1. 新增繁體中文 idf 語料庫
2. 為了讓 jieba 可以切換 iff 語料庫,新增 get_idf, set_idf_path 方法,並改寫 extract_tags
3. test 增加 extract_tags_idfpath
|
2014-08-05 22:55:13 +08:00 |
|
Dingyuan Wang
|
8b07bce568
|
fix the u'xxx' string.
|
2014-06-21 23:30:06 +08:00 |
|
Dingyuan Wang
|
c04ccd0d12
|
Update to v0.32 according to the master branch.
|
2014-06-14 22:31:13 +08:00 |
|
Dingyuan Wang
|
81f77d7a08
|
Fix the re in enable_parallel.
|
2014-06-14 15:22:13 +08:00 |
|
ShuraChow
|
7583f7760a
|
fix issue #161
posseg每次根据jieba.user_word_tag_tab的长度判断是否有新词载入,如果有,则更新word_tag_tab,然后清空jieba.user_word_tag_tab
|
2014-06-10 02:04:09 +08:00 |
|
davidlihm
|
5b2ec920ed
|
Update __init__.py
|
2014-05-15 07:55:11 +08:00 |
|
jagt
|
7f3513edb7
|
close cache file to avoid warning message.
|
2014-04-24 00:35:09 +08:00 |
|
wind
|
7488b114e7
|
use logging instead of print in init file
|
2014-03-20 13:48:33 +13:00 |
|
fxsjy
|
2682e887b8
|
Merge branch 'master' of https://github.com/fxsjy/jieba
|
2014-03-02 17:52:52 +08:00 |
|
fxsjy
|
9d4ac26f16
|
fix the bug of issue#137
|
2014-03-02 17:52:19 +08:00 |
|
Sun Junyi
|
3e430e9769
|
Update __init__.py
|
2014-02-16 20:09:57 +08:00 |
|
Honghe Wu
|
7720fbc1d8
|
fix a bug about can not import ChineseAnalyzer with change tab to 4 wihte spaces under PEP8
|
2014-02-15 19:32:29 +08:00 |
|
fxsjy
|
dafc73425e
|
fix a little problem of dict.txt
|
2014-02-07 14:35:38 +08:00 |
|
fxsjy
|
7cc7e70843
|
Merge branch 'master' of https://github.com/fxsjy/jieba
|
2014-01-28 13:48:35 +08:00 |
|
fxsjy
|
18678d50c6
|
fix bug issue #132
|
2014-01-28 13:48:03 +08:00 |
|