sunjunyi01
b4dd5b58f3
bug fix, issue: #511 , #512
2017-08-28 21:10:50 +08:00
Sun Junyi
4eef868338
Merge pull request #455 from OOCZC/master
...
Update README.md
2017-04-06 15:22:01 +08:00
OOC
b485ae916c
Update README.md
2017-04-04 11:45:53 +08:00
OOC
ee0ce32bbd
Update
2017-04-04 11:17:44 +08:00
Sun Junyi
8ba26cf97e
Merge pull request #382 from huntzhan/master
...
Bugfix for HMM=False in parallelism.
2016-08-05 10:02:41 +08:00
huntzhan
60acefd9b1
Bugfix for HMM=False in parallelism.
2016-08-04 17:43:35 +08:00
Sun Junyi
03cd4b5fb6
Merge pull request #367 from yanyiwu/patch-1
...
Update README.md
2016-06-12 09:37:16 +08:00
Yanyi Wu
76ae798137
Update README.md
2016-06-10 22:48:01 +08:00
Sun Junyi
0243d568e9
Merge pull request #351 from gumblex/master
...
fix del_word
2016-03-16 10:22:34 +08:00
Dingyuan Wang
12b2b17741
fix del_word
2016-03-15 18:58:12 +08:00
fxsjy
1d5ea9f061
version change 0.38
2015-12-16 16:12:49 +08:00
Sun Junyi
e5c9af78e2
Merge pull request #315 from gumblex/master
...
命令行分词支持词性标注
2015-11-17 19:13:36 +08:00
Dingyuan Wang
87734d3785
support POS tagging in __main__
2015-11-17 19:06:44 +08:00
Sun Junyi
3d29b0c8e8
Merge pull request #310 from gumblex/master
...
Fix compatibility problem with `with` statememt
2015-11-13 14:22:50 +08:00
Dingyuan Wang
1fcd3a417c
fix compatibility problem with with
statememt
2015-11-13 13:16:19 +08:00
Sun Junyi
093980647b
Merge pull request #303 from jerryday/master
...
add a withFlag param to extract_tags
2015-11-13 10:19:53 +08:00
Sun Junyi
f73a2183a5
Merge pull request #309 from gumblex/master
...
用 pkg_resources 载入默认字典
2015-11-13 10:18:50 +08:00
Dingyuan Wang
8814e08f9b
load default dictionary from pkg_resources and improve the loading method;
...
change the serialized models from marshal to pickle
2015-11-12 20:18:09 +08:00
Sun Junyi
70f019b669
Merge pull request #307 from gumblex/master
...
扩充汉字范围;修正 load_userdict
2015-11-09 22:12:23 +08:00
Dingyuan Wang
5270ed66ff
fix typo in type detection in load_userdict
2015-11-09 21:37:29 +08:00
Dingyuan Wang
99d0fb1a8a
use regex and fix encoding related issues in load_userdict
2015-11-09 20:54:50 +08:00
Dingyuan Wang
1c33252fce
change the recognized Chinese character range to [\u4E00-\u9FD5]
2015-11-09 20:23:43 +08:00
jerryday
e5e41a4aad
fix pair object in dict problem
2015-10-30 16:38:50 +08:00
jerryday
4f8ca83661
add a withFlag param in textrank
2015-10-30 15:40:41 +08:00
jerryday
26e339f8f7
add a withFlag param to extract_tags
2015-10-30 11:09:24 +08:00
Sun Junyi
b6f1ce773e
Merge pull request #298 from anderscui/master
...
Add introduction to jieba.NET port.
2015-09-23 06:54:56 +08:00
andersc
343bfe9783
Add introduction to jieba.NET port.
2015-09-22 23:16:02 +08:00
fxsjy
cb414cb861
version update
2015-06-27 16:49:44 +08:00
Sun Junyi
8e99a13aa9
Merge pull request #275 from gumblex/master
...
防止跨文件系统创建缓存
2015-06-26 23:22:42 +08:00
Dingyuan Wang
d0e68974bf
improved doc for tmp_dir and cache_file
2015-06-26 22:24:20 +08:00
Dingyuan Wang
66fe17517d
prevent moving across different filesystems at tempfile.mkstemp
2015-06-26 22:12:39 +08:00
Dingyuan Wang
be46ddef9a
use shutil.move for all platforms in case of different filesystems
2015-06-26 21:52:53 +08:00
Sun Junyi
17652e764f
Merge pull request #271 from gumblex/master
...
修复 cut_for_search;改善 pair 对象
2015-06-01 18:40:31 +08:00
Dingyuan Wang
ceb5c26be4
fix self.FREQ in cut_for_search; make pair object iterable
2015-06-01 14:36:38 +08:00
Sun Junyi
9f4d9376b0
Merge pull request #269 from gumblex/master
...
自定义字典允许指定词性同时省略词频
2015-05-24 19:56:51 +08:00
Dingyuan Wang
3b76328f2a
allow ignoring word frequency while providing pos tag
2015-05-23 21:51:00 +08:00
Sun Junyi
3ec4c43788
Merge pull request #260 from gumblex/master
...
使用类包装全局函数
2015-05-11 10:24:49 +08:00
Dingyuan Wang
94840a734c
wraps most globals in classes
...
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default
Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
2015-05-09 21:29:05 +08:00
Sun Junyi
e359d08964
Merge pull request #257 from gip0/gip0-patch-1
...
fixed an error in load_userdict()
2015-05-02 17:27:16 +08:00
Gilbert Liu
f6e57ab2ae
fixed an error in load_userdict()
2015-05-01 12:52:28 -07:00
Sun Junyi
60f0028175
Merge pull request #252 from fukuball/master
...
更新 README
2015-04-28 22:42:40 +08:00
Fukuball Lin
e712a4de61
更新 README
...
增加结巴分词 PHP 版本相關資訊
2015-04-28 22:05:26 +08:00
fxsjy
29d2b838dc
a minor version on pypi, which removes *.pyc
2015-04-17 19:35:12 +08:00
fxsjy
c07b7fef54
hot-fix version for pull request #248
2015-04-10 18:54:51 +08:00
Sun Junyi
753c1be49c
Merge pull request #248 from wangbin/master
...
exlucde word fragments from FREQ in posseg.cut
2015-04-02 15:32:41 +08:00
Wang Bin
84ffa0d4bf
exlucde word fragments from FREQ
2015-04-02 11:06:55 +08:00
Sun Junyi
885417aed1
Merge pull request #247 from gumblex/master
...
更新文档
v0.36
2015-03-21 17:05:05 +08:00
Dingyuan Wang
eeaab012bf
update docs
2015-03-21 10:53:42 +08:00
fxsjy
89481cfd84
version update 0.36
2015-03-20 11:00:55 +08:00
Sun Junyi
59aa8b69b1
Merge pull request #246 from gumblex/master
...
增加自动词频
2015-03-16 10:10:53 +08:00