245 Commits

Author SHA1 Message Date
zhangcheng
01b7f6efcf improve some details from other commiters' adivces 2015-02-16 20:35:45 +08:00
zhangcheng
8b8c6c85d0 remove unusage import 2015-02-16 15:51:05 +08:00
zhangcheng
a6d1b2479e build stable sort for graph iteration, then we can get stable result and adatpe details for python 3~ 2015-02-16 15:49:10 +08:00
zhangcheng
1152db7736 build stable sort for graph iteration, then we can get stable result. 2015-02-16 15:46:36 +08:00
fxsjy
49657c976d make extract_tags behavior compatiable with previous version 2015-02-14 21:23:58 +08:00
fxsjy
abcaf3e475 fix bug: load_userdict 2015-02-14 19:56:38 +08:00
Jack
a06b7d388e fix bug in __main__.py 2015-02-12 14:08:39 +08:00
Dingyuan Wang
f2b7183a71 use str.splitlines to avoid losing line breaks 2015-02-12 12:39:14 +08:00
Dingyuan Wang
f808ea0ebb use only one dict to store words and prefixes 2015-02-12 10:31:52 +08:00
Dingyuan Wang
32a0e92a09 don't compile re every time; autopep8 2015-02-10 21:22:34 +08:00
Dingyuan Wang
22bcf8be7a Merge master and jieba3k, make the code Python 2/3 compatible 2015-02-10 20:54:55 +08:00
Dingyuan Wang
4197dfb8fa store int directly in FREQ; small improvements 2015-02-09 16:26:00 +08:00
Dingyuan Wang
765fd6b7f0 store int directly in FREQ; small improvements 2015-02-09 16:14:12 +08:00
Dingyuan Wang
7bcb128f5f fix textrank divided by zero; fix posseg.pair.__repr__ 2014-12-20 00:12:42 +08:00
Lin
fea3aec6bd Fix divided by zero issue in case of words are not found in dict. 2014-12-05 17:13:12 +08:00
Dingyuan Wang
c6b386f65b update jieba3k 2014-11-29 16:06:20 +08:00
Dingyuan Wang
7b7c6955a9 complete the setup.py, fix #202 problem in posseg 2014-11-29 15:33:42 +08:00
Nomaka
9cb76dd8b9 Update __init__.py
calc的idx参数没用
2014-11-18 16:00:49 +08:00
walkskyer
a336e26403 为函数textrank增加参数allowPOS,并修改extract_tags的参数allowPOS与textrank保持一致。 2014-11-15 18:36:09 +08:00
walkskyer
bab5f362ba 将exstract_tags参数allowPOS转换为frozenset以减少查找时间。 2014-11-15 18:14:47 +08:00
fxsjy
447c1ded8c fix problem for python3.2 2014-11-15 13:44:30 +08:00
walkskyer
d82d2c18df 为关键字提取函数增加词性过滤功能 2014-11-13 22:26:22 +08:00
fxsjy
315a411e52 version update 2014-11-13 10:43:43 +08:00
walkskyer
5571a0337a 修复stop words处理未考虑"\r"导致不能正常匹配的问题。 2014-11-12 22:33:27 +08:00
Dingyuan Wang
7a6caa0c3c port extract_tags, etc to jieba3k; add auto2to3 script 2014-11-07 23:33:31 +08:00
Dingyuan Wang
751ff35eb5 improve extract_tags; unify extract_tags and testrank 2014-10-31 23:15:51 +08:00
Dingyuan Wang
e3f3dcccba improve the loading and caching process 2014-10-31 21:56:09 +08:00
Dingyuan Wang
fd9f1f2c0e update README, textrank, etc. 2014-10-25 14:23:37 +08:00
Dingyuan Wang
a6119cc995 add custom dictionary to __main__; update README; slightly optimize textrank 2014-10-25 12:59:36 +08:00
zhangcheng
6eb9f6149c add a simple implementation of textrank 2014-10-24 21:15:54 +08:00
fxsjy
f5ca87e088 merge change of @fukuball 2014-10-23 15:59:08 +08:00
fxsjy
ba87fcb01f remove trie, use prefix set instead 2014-10-20 14:08:09 +08:00
fxsjy
82bfffb6ed version update to 0.34 2014-10-20 13:35:13 +08:00
Dingyuan Wang
bb1e6000c6 fix version; fix spaces at end of line 2014-10-19 10:57:46 +08:00
Dingyuan Wang
14671d4feb fix __main__.py 2014-10-19 10:41:09 +08:00
Dingyuan Wang
b367690eeb use prefix dict instead of trie, add a command line interface, and a few small improvements 2014-10-19 10:32:23 +08:00
Dingyuan Wang
51df77831b use prefix dict instead of trie, add a command line interface, and a few small improvements 2014-10-18 22:23:26 +08:00
fxsjy
eb98eb9248 fix performance problem of extrag_tags 2014-10-10 15:41:28 +08:00
keroro520
77b442fa88 fix issues (https://github.com/fxsjy/jieba/issues/125) 2014-09-12 13:42:05 +08:00
Dingyuan Wang
626b415152 fix dict.itervalues mistake 2014-09-07 19:21:13 +08:00
Dingyuan Wang
6a3f228c72 fix python3 stuff 2014-09-07 18:50:10 +08:00
Dingyuan Wang
b16cf0d63f fix indent typo 2014-09-06 23:37:54 +08:00
Dingyuan Wang
6fad5fbb2c update to v0.33 2014-09-06 23:28:47 +08:00
Fukuball Lin
b658ee69cb 讓 jieba 可以自行增加 stop words 語料庫
1. 增加範例 stop words 語料庫
2. 為了讓 jieba 可以切換 stop words 語料庫,新增 set_stop_words 方法,並改寫 extract_tags
3. test 增加 extract_tags_stop_words.py 測試範例
2014-08-06 03:35:16 +08:00
Fukuball Lin
7198d562f1 讓 jieba 可以切換 idf 語料庫
1. 新增繁體中文 idf 語料庫
2. 為了讓 jieba 可以切換 iff 語料庫,新增 get_idf, set_idf_path 方法,並改寫 extract_tags
3. test 增加 extract_tags_idfpath
2014-08-05 22:55:13 +08:00
Dingyuan Wang
8b07bce568 fix the u'xxx' string. 2014-06-21 23:30:06 +08:00
Dingyuan Wang
c04ccd0d12 Update to v0.32 according to the master branch. 2014-06-14 22:31:13 +08:00
Dingyuan Wang
81f77d7a08 Fix the re in enable_parallel. 2014-06-14 15:22:13 +08:00
ShuraChow
7583f7760a fix issue #161
posseg每次根据jieba.user_word_tag_tab的长度判断是否有新词载入,如果有,则更新word_tag_tab,然后清空jieba.user_word_tag_tab
2014-06-10 02:04:09 +08:00
davidlihm
5b2ec920ed Update __init__.py 2014-05-15 07:55:11 +08:00