fxsjy
aa65031788
fix file mode
2020-01-13 21:03:38 +08:00
fxsjy
2eb11c8028
fix issue #810
2020-01-13 20:53:43 +08:00
JesseyXujin
d703bce302
paddle coredump exception fix ( #807 )
...
* paddle_null_point_fix
* add core expception note
* delete yield
* modify test paddle for supporting enable_paddle()
2020-01-10 16:30:46 +08:00
fxsjy
97c32464e1
fix issue #798
2020-01-03 14:10:48 +08:00
pkpk
27910094ac
Fix bugs in Paddle seg and Paddle postag ( #789 )
...
* fix bugs in paddle seg and paddle postag
* fix compat in checking paddle
2019-12-24 21:02:55 +08:00
fxsjy
478c3b9bb4
lazy import paddle
2019-12-24 19:19:51 +08:00
imzhengzx
ca444fb4da
fix the error about imoprting ChineseAnalyzer
...
Because of the interface change about ChineseAnlayzer , the code 'from jieba.analyse import Chinese Analyzer' in this test file would report an ImportError like 'cannot import name 'ChineseAnalyzer'. Just change import code to 'from jieba.analyse.analyzer import ChineseAnalyzer' can fix it.
2018-09-15 11:59:01 +08:00
sunjunyi01
b4dd5b58f3
bug fix, issue: #511 , #512
2017-08-28 21:10:50 +08:00
huntzhan
60acefd9b1
Bugfix for HMM=False in parallelism.
2016-08-04 17:43:35 +08:00
Dingyuan Wang
99d0fb1a8a
use regex and fix encoding related issues in load_userdict
2015-11-09 20:54:50 +08:00
Dingyuan Wang
ceb5c26be4
fix self.FREQ in cut_for_search; make pair object iterable
2015-06-01 14:36:38 +08:00
Dingyuan Wang
3b76328f2a
allow ignoring word frequency while providing pos tag
2015-05-23 21:51:00 +08:00
Dingyuan Wang
94840a734c
wraps most globals in classes
...
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default
Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
2015-05-09 21:29:05 +08:00
Dingyuan Wang
4a552ca94f
suggest word frequency, support passing str to add_word
2015-03-14 12:44:19 +08:00
Dingyuan Wang
872a7039f2
Merge branch 'master' of https://github.com/fxsjy/jieba
2015-02-12 10:33:56 +08:00
Dingyuan Wang
f808ea0ebb
use only one dict to store words and prefixes
2015-02-12 10:31:52 +08:00
fxsjy
5bfa43a781
fix test scripts
2015-02-11 20:46:48 +08:00
Dingyuan Wang
f3a53dd2da
fix print() in tests
2015-02-11 20:45:55 +08:00
fxsjy
8cbb26a7b6
fix test_file.py
2015-02-11 16:47:57 +08:00
Dingyuan Wang
22bcf8be7a
Merge master and jieba3k, make the code Python 2/3 compatible
2015-02-10 20:54:55 +08:00
Dingyuan Wang
3dad899ec8
backport 2to3 scripts and changelog
2014-11-29 16:12:25 +08:00
Dingyuan Wang
c6b386f65b
update jieba3k
2014-11-29 16:06:20 +08:00
Dingyuan Wang
a5ecf70f71
update to v0.35
2014-11-14 20:59:54 +08:00
Dingyuan Wang
4a6140081e
fix problems in auto2to3
2014-11-07 23:47:57 +08:00
Dingyuan Wang
7a6caa0c3c
port extract_tags, etc to jieba3k; add auto2to3 script
2014-11-07 23:33:31 +08:00
walkskyer
6772f0282e
修复带权重测试脚本输出结果是调用顺序错误
2014-11-06 22:24:43 +08:00
Dingyuan Wang
fd9f1f2c0e
update README, textrank, etc.
2014-10-25 14:23:37 +08:00
fxsjy
f5ca87e088
merge change of @fukuball
2014-10-23 15:59:08 +08:00
Dingyuan Wang
bb1e6000c6
fix version; fix spaces at end of line
2014-10-19 10:57:46 +08:00
Dingyuan Wang
51df77831b
use prefix dict instead of trie, add a command line interface, and a few small improvements
2014-10-18 22:23:26 +08:00
Dingyuan Wang
6fad5fbb2c
update to v0.33
2014-09-06 23:28:47 +08:00
Fukuball Lin
b658ee69cb
讓 jieba 可以自行增加 stop words 語料庫
...
1. 增加範例 stop words 語料庫
2. 為了讓 jieba 可以切換 stop words 語料庫,新增 set_stop_words 方法,並改寫 extract_tags
3. test 增加 extract_tags_stop_words.py 測試範例
2014-08-06 03:35:16 +08:00
Fukuball Lin
7198d562f1
讓 jieba 可以切換 idf 語料庫
...
1. 新增繁體中文 idf 語料庫
2. 為了讓 jieba 可以切換 iff 語料庫,新增 get_idf, set_idf_path 方法,並改寫 extract_tags
3. test 增加 extract_tags_idfpath
2014-08-05 22:55:13 +08:00
Dingyuan Wang
c04ccd0d12
Update to v0.32 according to the master branch.
2014-06-14 22:31:13 +08:00
fxsjy
18678d50c6
fix bug issue #132
2014-01-28 13:48:03 +08:00
gan
31d5845535
add better support for english. like input: 'this is interesting and interested me'-->output:'this interest interest',which 'interest' match 'interesting interested'
2013-09-09 11:54:30 +08:00
Sun Junyi
7e7fcc1184
add an option to disable HMM
2013-09-05 17:09:27 +08:00
ZoeyYoung
d49542c06e
fix bug
2013-08-21 19:31:12 +08:00
ZoeyYoung
dce353f88b
merge from master
2013-08-21 15:32:46 +08:00
ZoeyYoung
2857ae45cc
Merge branch 'master' into jieba3k
...
Conflicts:
Changelog
jieba/__init__.py
jieba/finalseg/__init__.py
jieba/posseg/__init__.py
setup.py
test/parallel/test_file.py
test/test_file.py
2013-08-21 13:55:21 +08:00
Sun Junyi
81390a2d23
test_file.py: close the file object
2013-08-02 15:51:33 +08:00
fxsjy
b77645b3aa
modify test_file.py; use less memory
2013-07-29 10:17:39 +08:00
Linker Lin
5d83855088
自动检测CPU数目,启动合适数目的进程。
2013-07-28 00:12:00 +08:00
Linker Lin
2ceb981da0
自动检测CPU数目,启动合适数目的进程。
2013-07-28 00:07:29 +08:00
Sun Junyi
6549deabbd
merge change from master
2013-07-16 11:06:41 +08:00
Cheng wei
6035bb6320
fix invalid syntax for python3
2013-07-06 02:52:17 +08:00
Sun Junyi
9d0ea771a5
fix bug; decimals & digit-english mixed
2013-07-05 16:16:49 +08:00
Sun Junyi
ba5114dc95
update whoosh example
2013-07-04 09:31:09 +08:00
Sun Junyi
f424862222
clean the files in tmp
2013-07-03 17:55:01 +08:00
Sun Junyi
b18d56d2a3
Merge pull request #72 from linkerlin/master
...
添加一个tmp目录,好让test_whoosh.py可以运行。
2013-07-03 02:52:46 -07:00