JesseyXujin
5b3bb4b7f2
加入paddle分词和词性标注功能 ( #788 )
...
* paddle cut release
* 修改README.md,提示用户安装paddlepaddle.tiny
* 删除两个init.py文件中utf头文件
* 修改readme细节
2019-12-24 17:27:41 +08:00
Hongxiang Lin
38134ee20f
修复suggest_freq中add_word指向的bug ( #723 )
2019-07-01 19:43:45 +08:00
Paul Meng
3645a5bb5d
Update README.md ( #745 )
2019-07-01 19:41:47 +08:00
Sun Junyi
8212b6c572
Update README.md
2018-12-03 16:29:32 +08:00
Sun Junyi
843cdc2b7c
Merge pull request #582 from hosiet/pr-fix-typo-codespell
...
Fix typos found by codespell
2018-09-20 10:44:47 +08:00
Sun Junyi
68f2a64f7e
Merge pull request #663 from JimCurryWang/patch-1
...
Fix __init__ "-" symbol issue
2018-09-20 10:40:35 +08:00
Sun Junyi
4c8479cfa6
Merge pull request #667 from ZhengZixiang/patch-1
...
fix the error about importing ChineseAnalyzer
2018-09-20 10:39:29 +08:00
imzhengzx
ca444fb4da
fix the error about imoprting ChineseAnalyzer
...
Because of the interface change about ChineseAnlayzer , the code 'from jieba.analyse import Chinese Analyzer' in this test file would report an ImportError like 'cannot import name 'ChineseAnalyzer'. Just change import code to 'from jieba.analyse.analyzer import ChineseAnalyzer' can fix it.
2018-09-15 11:59:01 +08:00
CY Wang
36a27302ce
Fix __init__ "-" symbol issue
...
Solving "-" symbol can't be analyze issue .
For example,
In keyword , chap-EX喬沛詩 , SK-II ...etc
the present version will show "chap", "-", "EX喬沛詩" , "SK", "-", "II"
After the modify,
The new version will show "chap-EX","喬沛詩" , "SK-II"
ps: I have used the jieba.load_userdict() , and added "chap-EX" , "喬沛詩", "SK-II" in the userdict.txt.
2018-08-27 17:05:46 +08:00
Sun Junyi
7653db2e33
Update README.md
2018-07-04 17:18:02 +08:00
Boyuan Yang
17ef8abba3
Fix typos found by codespell
2018-01-21 19:15:48 +08:00
fxsjy
cb0de2973b
version change 0.39
v0.39
2017-08-28 21:40:18 +08:00
sunjunyi01
b4dd5b58f3
bug fix, issue: #511 , #512
2017-08-28 21:10:50 +08:00
Sun Junyi
4eef868338
Merge pull request #455 from OOCZC/master
...
Update README.md
2017-04-06 15:22:01 +08:00
OOC
b485ae916c
Update README.md
2017-04-04 11:45:53 +08:00
OOC
ee0ce32bbd
Update
2017-04-04 11:17:44 +08:00
Sun Junyi
8ba26cf97e
Merge pull request #382 from huntzhan/master
...
Bugfix for HMM=False in parallelism.
2016-08-05 10:02:41 +08:00
huntzhan
60acefd9b1
Bugfix for HMM=False in parallelism.
2016-08-04 17:43:35 +08:00
Sun Junyi
03cd4b5fb6
Merge pull request #367 from yanyiwu/patch-1
...
Update README.md
2016-06-12 09:37:16 +08:00
Yanyi Wu
76ae798137
Update README.md
2016-06-10 22:48:01 +08:00
Sun Junyi
0243d568e9
Merge pull request #351 from gumblex/master
...
fix del_word
2016-03-16 10:22:34 +08:00
Dingyuan Wang
12b2b17741
fix del_word
2016-03-15 18:58:12 +08:00
fxsjy
1d5ea9f061
version change 0.38
2015-12-16 16:12:49 +08:00
Sun Junyi
e5c9af78e2
Merge pull request #315 from gumblex/master
...
命令行分词支持词性标注
2015-11-17 19:13:36 +08:00
Dingyuan Wang
87734d3785
support POS tagging in __main__
2015-11-17 19:06:44 +08:00
Sun Junyi
3d29b0c8e8
Merge pull request #310 from gumblex/master
...
Fix compatibility problem with `with` statememt
2015-11-13 14:22:50 +08:00
Dingyuan Wang
1fcd3a417c
fix compatibility problem with with
statememt
2015-11-13 13:16:19 +08:00
Sun Junyi
093980647b
Merge pull request #303 from jerryday/master
...
add a withFlag param to extract_tags
2015-11-13 10:19:53 +08:00
Sun Junyi
f73a2183a5
Merge pull request #309 from gumblex/master
...
用 pkg_resources 载入默认字典
2015-11-13 10:18:50 +08:00
Dingyuan Wang
8814e08f9b
load default dictionary from pkg_resources and improve the loading method;
...
change the serialized models from marshal to pickle
2015-11-12 20:18:09 +08:00
Sun Junyi
70f019b669
Merge pull request #307 from gumblex/master
...
扩充汉字范围;修正 load_userdict
2015-11-09 22:12:23 +08:00
Dingyuan Wang
5270ed66ff
fix typo in type detection in load_userdict
2015-11-09 21:37:29 +08:00
Dingyuan Wang
99d0fb1a8a
use regex and fix encoding related issues in load_userdict
2015-11-09 20:54:50 +08:00
Dingyuan Wang
1c33252fce
change the recognized Chinese character range to [\u4E00-\u9FD5]
2015-11-09 20:23:43 +08:00
jerryday
e5e41a4aad
fix pair object in dict problem
2015-10-30 16:38:50 +08:00
jerryday
4f8ca83661
add a withFlag param in textrank
2015-10-30 15:40:41 +08:00
jerryday
26e339f8f7
add a withFlag param to extract_tags
2015-10-30 11:09:24 +08:00
Sun Junyi
b6f1ce773e
Merge pull request #298 from anderscui/master
...
Add introduction to jieba.NET port.
2015-09-23 06:54:56 +08:00
andersc
343bfe9783
Add introduction to jieba.NET port.
2015-09-22 23:16:02 +08:00
fxsjy
cb414cb861
version update
2015-06-27 16:49:44 +08:00
Sun Junyi
8e99a13aa9
Merge pull request #275 from gumblex/master
...
防止跨文件系统创建缓存
2015-06-26 23:22:42 +08:00
Dingyuan Wang
d0e68974bf
improved doc for tmp_dir and cache_file
2015-06-26 22:24:20 +08:00
Dingyuan Wang
66fe17517d
prevent moving across different filesystems at tempfile.mkstemp
2015-06-26 22:12:39 +08:00
Dingyuan Wang
be46ddef9a
use shutil.move for all platforms in case of different filesystems
2015-06-26 21:52:53 +08:00
Sun Junyi
17652e764f
Merge pull request #271 from gumblex/master
...
修复 cut_for_search;改善 pair 对象
2015-06-01 18:40:31 +08:00
Dingyuan Wang
ceb5c26be4
fix self.FREQ in cut_for_search; make pair object iterable
2015-06-01 14:36:38 +08:00
Sun Junyi
9f4d9376b0
Merge pull request #269 from gumblex/master
...
自定义字典允许指定词性同时省略词频
2015-05-24 19:56:51 +08:00
Dingyuan Wang
3b76328f2a
allow ignoring word frequency while providing pos tag
2015-05-23 21:51:00 +08:00
Sun Junyi
3ec4c43788
Merge pull request #260 from gumblex/master
...
使用类包装全局函数
2015-05-11 10:24:49 +08:00
Dingyuan Wang
94840a734c
wraps most globals in classes
...
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default
Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
2015-05-09 21:29:05 +08:00