pkpk
27910094ac
Fix bugs in Paddle seg and Paddle postag ( #789 )
...
* fix bugs in paddle seg and paddle postag
* fix compat in checking paddle
2019-12-24 21:02:55 +08:00
JesseyXujin
5b3bb4b7f2
加入paddle分词和词性标注功能 ( #788 )
...
* paddle cut release
* 修改README.md,提示用户安装paddlepaddle.tiny
* 删除两个init.py文件中utf头文件
* 修改readme细节
2019-12-24 17:27:41 +08:00
Sun Junyi
3d29b0c8e8
Merge pull request #310 from gumblex/master
...
Fix compatibility problem with `with` statememt
2015-11-13 14:22:50 +08:00
Dingyuan Wang
1fcd3a417c
fix compatibility problem with with
statememt
2015-11-13 13:16:19 +08:00
Sun Junyi
093980647b
Merge pull request #303 from jerryday/master
...
add a withFlag param to extract_tags
2015-11-13 10:19:53 +08:00
Dingyuan Wang
8814e08f9b
load default dictionary from pkg_resources and improve the loading method;
...
change the serialized models from marshal to pickle
2015-11-12 20:18:09 +08:00
Dingyuan Wang
1c33252fce
change the recognized Chinese character range to [\u4E00-\u9FD5]
2015-11-09 20:23:43 +08:00
jerryday
e5e41a4aad
fix pair object in dict problem
2015-10-30 16:38:50 +08:00
jerryday
4f8ca83661
add a withFlag param in textrank
2015-10-30 15:40:41 +08:00
Dingyuan Wang
ceb5c26be4
fix self.FREQ in cut_for_search; make pair object iterable
2015-06-01 14:36:38 +08:00
Dingyuan Wang
94840a734c
wraps most globals in classes
...
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default
Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
2015-05-09 21:29:05 +08:00
Wang Bin
84ffa0d4bf
exlucde word fragments from FREQ
2015-04-02 11:06:55 +08:00
Dingyuan Wang
f2b7183a71
use str.splitlines to avoid losing line breaks
2015-02-12 12:39:14 +08:00
Dingyuan Wang
32a0e92a09
don't compile re every time; autopep8
2015-02-10 21:22:34 +08:00
Dingyuan Wang
22bcf8be7a
Merge master and jieba3k, make the code Python 2/3 compatible
2015-02-10 20:54:55 +08:00
Dingyuan Wang
4197dfb8fa
store int directly in FREQ; small improvements
2015-02-09 16:26:00 +08:00
Dingyuan Wang
765fd6b7f0
store int directly in FREQ; small improvements
2015-02-09 16:14:12 +08:00
Dingyuan Wang
7bcb128f5f
fix textrank divided by zero; fix posseg.pair.__repr__
2014-12-20 00:12:42 +08:00
Dingyuan Wang
c6b386f65b
update jieba3k
2014-11-29 16:06:20 +08:00
Dingyuan Wang
7b7c6955a9
complete the setup.py, fix #202 problem in posseg
2014-11-29 15:33:42 +08:00
fxsjy
447c1ded8c
fix problem for python3.2
2014-11-15 13:44:30 +08:00
Dingyuan Wang
7a6caa0c3c
port extract_tags, etc to jieba3k; add auto2to3 script
2014-11-07 23:33:31 +08:00
Dingyuan Wang
751ff35eb5
improve extract_tags; unify extract_tags and testrank
2014-10-31 23:15:51 +08:00
Dingyuan Wang
fd9f1f2c0e
update README, textrank, etc.
2014-10-25 14:23:37 +08:00
Dingyuan Wang
bb1e6000c6
fix version; fix spaces at end of line
2014-10-19 10:57:46 +08:00
Dingyuan Wang
b367690eeb
use prefix dict instead of trie, add a command line interface, and a few small improvements
2014-10-19 10:32:23 +08:00
Dingyuan Wang
51df77831b
use prefix dict instead of trie, add a command line interface, and a few small improvements
2014-10-18 22:23:26 +08:00
Dingyuan Wang
c04ccd0d12
Update to v0.32 according to the master branch.
2014-06-14 22:31:13 +08:00
ShuraChow
7583f7760a
fix issue #161
...
posseg每次根据jieba.user_word_tag_tab的长度判断是否有新词载入,如果有,则更新word_tag_tab,然后清空jieba.user_word_tag_tab
2014-06-10 02:04:09 +08:00
aholic
e2c796088f
better indent
2014-01-24 00:43:48 +08:00
Sun Junyi
7e7fcc1184
add an option to disable HMM
2013-09-05 17:09:27 +08:00
fxsjy
21f7da0ca4
conver tab to spaces
2013-08-30 18:31:25 +08:00
fxsjy
c5bd9773d1
fix bug in issue #103
2013-08-30 18:26:53 +08:00
ZoeyYoung
25839b5127
fix bug
2013-08-21 19:46:14 +08:00
ZoeyYoung
d49542c06e
fix bug
2013-08-21 19:31:12 +08:00
ZoeyYoung
dce353f88b
merge from master
2013-08-21 15:32:46 +08:00
ZoeyYoung
2857ae45cc
Merge branch 'master' into jieba3k
...
Conflicts:
Changelog
jieba/__init__.py
jieba/finalseg/__init__.py
jieba/posseg/__init__.py
setup.py
test/parallel/test_file.py
test/test_file.py
2013-08-21 13:55:21 +08:00
gwdwyy
cc81135429
sed -i 's/not \(.*\) in/\1 not in/g' ...
2013-08-20 20:08:03 +08:00
fxsjy
8e9b4bbe72
fix the compatibility with Python2.5
2013-07-25 10:25:24 +08:00
Sun Junyi
d4ede0fee6
hold the backward compatibility, let jython use a special loading workflow
2013-07-25 10:08:58 +08:00
piaolignxue
aea8496b1f
serialize model to file so that it can support jython.
2013-07-24 22:50:48 +08:00
Sun Junyi
6549deabbd
merge change from master
2013-07-16 11:06:41 +08:00
Sun Junyi
d63140fe5e
make a serial white spaces seperated
2013-07-10 17:27:47 +08:00
Sun Junyi
b62f052927
PEP8
2013-07-03 17:21:21 +08:00
Sun Junyi
45daf561c7
follow PEP8: change tab to 4 white spaces
2013-07-03 16:58:22 +08:00
Sun Junyi
ca97b19951
merge change from master
2013-06-23 22:28:32 +08:00
fxsjy
e1afafe353
fix a bug of cxfree support
2013-06-23 12:50:28 +08:00
fxsjy
a9f53e9c85
don't seprate CRLF
2013-06-22 21:56:39 +08:00
fxsjy
c015f4e297
support cxfree py2exe; keep white space
2013-06-22 21:24:45 +08:00
fxsjy
be1686654d
merge master to jieba3k
2013-06-08 11:18:56 +08:00