73 Commits

Author SHA1 Message Date
pkpk
27910094ac Fix bugs in Paddle seg and Paddle postag (#789)
* fix bugs in paddle seg and paddle postag

* fix compat in checking paddle
2019-12-24 21:02:55 +08:00
JesseyXujin
5b3bb4b7f2 加入paddle分词和词性标注功能 (#788)
* paddle cut release

* 修改README.md,提示用户安装paddlepaddle.tiny

* 删除两个init.py文件中utf头文件

* 修改readme细节
2019-12-24 17:27:41 +08:00
Sun Junyi
3d29b0c8e8 Merge pull request #310 from gumblex/master
Fix compatibility problem with `with` statememt
2015-11-13 14:22:50 +08:00
Dingyuan Wang
1fcd3a417c fix compatibility problem with with statememt 2015-11-13 13:16:19 +08:00
Sun Junyi
093980647b Merge pull request #303 from jerryday/master
add a withFlag param to extract_tags
2015-11-13 10:19:53 +08:00
Dingyuan Wang
8814e08f9b load default dictionary from pkg_resources and improve the loading method;
change the serialized models from marshal to pickle
2015-11-12 20:18:09 +08:00
Dingyuan Wang
1c33252fce change the recognized Chinese character range to [\u4E00-\u9FD5] 2015-11-09 20:23:43 +08:00
jerryday
e5e41a4aad fix pair object in dict problem 2015-10-30 16:38:50 +08:00
jerryday
4f8ca83661 add a withFlag param in textrank 2015-10-30 15:40:41 +08:00
Dingyuan Wang
ceb5c26be4 fix self.FREQ in cut_for_search; make pair object iterable 2015-06-01 14:36:38 +08:00
Dingyuan Wang
94840a734c wraps most globals in classes
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default

Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
2015-05-09 21:29:05 +08:00
Wang Bin
84ffa0d4bf exlucde word fragments from FREQ 2015-04-02 11:06:55 +08:00
Dingyuan Wang
f2b7183a71 use str.splitlines to avoid losing line breaks 2015-02-12 12:39:14 +08:00
Dingyuan Wang
32a0e92a09 don't compile re every time; autopep8 2015-02-10 21:22:34 +08:00
Dingyuan Wang
22bcf8be7a Merge master and jieba3k, make the code Python 2/3 compatible 2015-02-10 20:54:55 +08:00
Dingyuan Wang
4197dfb8fa store int directly in FREQ; small improvements 2015-02-09 16:26:00 +08:00
Dingyuan Wang
765fd6b7f0 store int directly in FREQ; small improvements 2015-02-09 16:14:12 +08:00
Dingyuan Wang
7bcb128f5f fix textrank divided by zero; fix posseg.pair.__repr__ 2014-12-20 00:12:42 +08:00
Dingyuan Wang
c6b386f65b update jieba3k 2014-11-29 16:06:20 +08:00
Dingyuan Wang
7b7c6955a9 complete the setup.py, fix #202 problem in posseg 2014-11-29 15:33:42 +08:00
fxsjy
447c1ded8c fix problem for python3.2 2014-11-15 13:44:30 +08:00
Dingyuan Wang
7a6caa0c3c port extract_tags, etc to jieba3k; add auto2to3 script 2014-11-07 23:33:31 +08:00
Dingyuan Wang
751ff35eb5 improve extract_tags; unify extract_tags and testrank 2014-10-31 23:15:51 +08:00
Dingyuan Wang
fd9f1f2c0e update README, textrank, etc. 2014-10-25 14:23:37 +08:00
Dingyuan Wang
bb1e6000c6 fix version; fix spaces at end of line 2014-10-19 10:57:46 +08:00
Dingyuan Wang
b367690eeb use prefix dict instead of trie, add a command line interface, and a few small improvements 2014-10-19 10:32:23 +08:00
Dingyuan Wang
51df77831b use prefix dict instead of trie, add a command line interface, and a few small improvements 2014-10-18 22:23:26 +08:00
Dingyuan Wang
c04ccd0d12 Update to v0.32 according to the master branch. 2014-06-14 22:31:13 +08:00
ShuraChow
7583f7760a fix issue #161
posseg每次根据jieba.user_word_tag_tab的长度判断是否有新词载入,如果有,则更新word_tag_tab,然后清空jieba.user_word_tag_tab
2014-06-10 02:04:09 +08:00
aholic
e2c796088f better indent 2014-01-24 00:43:48 +08:00
Sun Junyi
7e7fcc1184 add an option to disable HMM 2013-09-05 17:09:27 +08:00
fxsjy
21f7da0ca4 conver tab to spaces 2013-08-30 18:31:25 +08:00
fxsjy
c5bd9773d1 fix bug in issue #103 2013-08-30 18:26:53 +08:00
ZoeyYoung
25839b5127 fix bug 2013-08-21 19:46:14 +08:00
ZoeyYoung
d49542c06e fix bug 2013-08-21 19:31:12 +08:00
ZoeyYoung
dce353f88b merge from master 2013-08-21 15:32:46 +08:00
ZoeyYoung
2857ae45cc Merge branch 'master' into jieba3k
Conflicts:
	Changelog
	jieba/__init__.py
	jieba/finalseg/__init__.py
	jieba/posseg/__init__.py
	setup.py
	test/parallel/test_file.py
	test/test_file.py
2013-08-21 13:55:21 +08:00
gwdwyy
cc81135429 sed -i 's/not \(.*\) in/\1 not in/g' ... 2013-08-20 20:08:03 +08:00
fxsjy
8e9b4bbe72 fix the compatibility with Python2.5 2013-07-25 10:25:24 +08:00
Sun Junyi
d4ede0fee6 hold the backward compatibility, let jython use a special loading workflow 2013-07-25 10:08:58 +08:00
piaolignxue
aea8496b1f serialize model to file so that it can support jython. 2013-07-24 22:50:48 +08:00
Sun Junyi
6549deabbd merge change from master 2013-07-16 11:06:41 +08:00
Sun Junyi
d63140fe5e make a serial white spaces seperated 2013-07-10 17:27:47 +08:00
Sun Junyi
b62f052927 PEP8 2013-07-03 17:21:21 +08:00
Sun Junyi
45daf561c7 follow PEP8: change tab to 4 white spaces 2013-07-03 16:58:22 +08:00
Sun Junyi
ca97b19951 merge change from master 2013-06-23 22:28:32 +08:00
fxsjy
e1afafe353 fix a bug of cxfree support 2013-06-23 12:50:28 +08:00
fxsjy
a9f53e9c85 don't seprate CRLF 2013-06-22 21:56:39 +08:00
fxsjy
c015f4e297 support cxfree py2exe; keep white space 2013-06-22 21:24:45 +08:00
fxsjy
be1686654d merge master to jieba3k 2013-06-08 11:18:56 +08:00