109 Commits

Author SHA1 Message Date
pkpk
27910094ac Fix bugs in Paddle seg and Paddle postag (#789)
* fix bugs in paddle seg and paddle postag

* fix compat in checking paddle
2019-12-24 21:02:55 +08:00
fxsjy
478c3b9bb4 lazy import paddle 2019-12-24 19:19:51 +08:00
imzhengzx
ca444fb4da
fix the error about imoprting ChineseAnalyzer
Because of the interface change about ChineseAnlayzer , the code 'from jieba.analyse import Chinese Analyzer' in this test file would report an ImportError like 'cannot import name 'ChineseAnalyzer'. Just change import code to 'from jieba.analyse.analyzer import ChineseAnalyzer' can fix it.
2018-09-15 11:59:01 +08:00
sunjunyi01
b4dd5b58f3 bug fix, issue: #511, #512 2017-08-28 21:10:50 +08:00
huntzhan
60acefd9b1 Bugfix for HMM=False in parallelism. 2016-08-04 17:43:35 +08:00
Dingyuan Wang
99d0fb1a8a use regex and fix encoding related issues in load_userdict 2015-11-09 20:54:50 +08:00
Dingyuan Wang
ceb5c26be4 fix self.FREQ in cut_for_search; make pair object iterable 2015-06-01 14:36:38 +08:00
Dingyuan Wang
3b76328f2a allow ignoring word frequency while providing pos tag 2015-05-23 21:51:00 +08:00
Dingyuan Wang
94840a734c wraps most globals in classes
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default

Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
2015-05-09 21:29:05 +08:00
Dingyuan Wang
4a552ca94f suggest word frequency, support passing str to add_word 2015-03-14 12:44:19 +08:00
Dingyuan Wang
872a7039f2 Merge branch 'master' of https://github.com/fxsjy/jieba 2015-02-12 10:33:56 +08:00
Dingyuan Wang
f808ea0ebb use only one dict to store words and prefixes 2015-02-12 10:31:52 +08:00
fxsjy
5bfa43a781 fix test scripts 2015-02-11 20:46:48 +08:00
Dingyuan Wang
f3a53dd2da fix print() in tests 2015-02-11 20:45:55 +08:00
fxsjy
8cbb26a7b6 fix test_file.py 2015-02-11 16:47:57 +08:00
Dingyuan Wang
22bcf8be7a Merge master and jieba3k, make the code Python 2/3 compatible 2015-02-10 20:54:55 +08:00
Dingyuan Wang
3dad899ec8 backport 2to3 scripts and changelog 2014-11-29 16:12:25 +08:00
Dingyuan Wang
c6b386f65b update jieba3k 2014-11-29 16:06:20 +08:00
Dingyuan Wang
a5ecf70f71 update to v0.35 2014-11-14 20:59:54 +08:00
Dingyuan Wang
4a6140081e fix problems in auto2to3 2014-11-07 23:47:57 +08:00
Dingyuan Wang
7a6caa0c3c port extract_tags, etc to jieba3k; add auto2to3 script 2014-11-07 23:33:31 +08:00
walkskyer
6772f0282e 修复带权重测试脚本输出结果是调用顺序错误 2014-11-06 22:24:43 +08:00
Dingyuan Wang
fd9f1f2c0e update README, textrank, etc. 2014-10-25 14:23:37 +08:00
fxsjy
f5ca87e088 merge change of @fukuball 2014-10-23 15:59:08 +08:00
Dingyuan Wang
bb1e6000c6 fix version; fix spaces at end of line 2014-10-19 10:57:46 +08:00
Dingyuan Wang
51df77831b use prefix dict instead of trie, add a command line interface, and a few small improvements 2014-10-18 22:23:26 +08:00
Dingyuan Wang
6fad5fbb2c update to v0.33 2014-09-06 23:28:47 +08:00
Fukuball Lin
b658ee69cb 讓 jieba 可以自行增加 stop words 語料庫
1. 增加範例 stop words 語料庫
2. 為了讓 jieba 可以切換 stop words 語料庫,新增 set_stop_words 方法,並改寫 extract_tags
3. test 增加 extract_tags_stop_words.py 測試範例
2014-08-06 03:35:16 +08:00
Fukuball Lin
7198d562f1 讓 jieba 可以切換 idf 語料庫
1. 新增繁體中文 idf 語料庫
2. 為了讓 jieba 可以切換 iff 語料庫,新增 get_idf, set_idf_path 方法,並改寫 extract_tags
3. test 增加 extract_tags_idfpath
2014-08-05 22:55:13 +08:00
Dingyuan Wang
c04ccd0d12 Update to v0.32 according to the master branch. 2014-06-14 22:31:13 +08:00
fxsjy
18678d50c6 fix bug issue #132 2014-01-28 13:48:03 +08:00
gan
31d5845535 add better support for english. like input: 'this is interesting and interested me'-->output:'this interest interest',which 'interest' match 'interesting interested' 2013-09-09 11:54:30 +08:00
Sun Junyi
7e7fcc1184 add an option to disable HMM 2013-09-05 17:09:27 +08:00
ZoeyYoung
d49542c06e fix bug 2013-08-21 19:31:12 +08:00
ZoeyYoung
dce353f88b merge from master 2013-08-21 15:32:46 +08:00
ZoeyYoung
2857ae45cc Merge branch 'master' into jieba3k
Conflicts:
	Changelog
	jieba/__init__.py
	jieba/finalseg/__init__.py
	jieba/posseg/__init__.py
	setup.py
	test/parallel/test_file.py
	test/test_file.py
2013-08-21 13:55:21 +08:00
Sun Junyi
81390a2d23 test_file.py: close the file object 2013-08-02 15:51:33 +08:00
fxsjy
b77645b3aa modify test_file.py; use less memory 2013-07-29 10:17:39 +08:00
Linker Lin
5d83855088 自动检测CPU数目,启动合适数目的进程。 2013-07-28 00:12:00 +08:00
Linker Lin
2ceb981da0 自动检测CPU数目,启动合适数目的进程。 2013-07-28 00:07:29 +08:00
Sun Junyi
6549deabbd merge change from master 2013-07-16 11:06:41 +08:00
Cheng wei
6035bb6320 fix invalid syntax for python3 2013-07-06 02:52:17 +08:00
Sun Junyi
9d0ea771a5 fix bug; decimals & digit-english mixed 2013-07-05 16:16:49 +08:00
Sun Junyi
ba5114dc95 update whoosh example 2013-07-04 09:31:09 +08:00
Sun Junyi
f424862222 clean the files in tmp 2013-07-03 17:55:01 +08:00
Sun Junyi
b18d56d2a3 Merge pull request #72 from linkerlin/master
添加一个tmp目录,好让test_whoosh.py可以运行。
2013-07-03 02:52:46 -07:00
Sun Junyi
b9b1f1a418 fix conflict of merging 2013-07-03 17:47:45 +08:00
miao.lin
becd32b178 made test_whoosh.py happy.
添加一个tmp目录,好让test_whoosh.py可以运行。
2013-07-03 17:32:35 +08:00
Sun Junyi
c01680c6a8 merge the new file 2013-07-03 17:29:33 +08:00
Sun Junyi
b62f052927 PEP8 2013-07-03 17:21:21 +08:00