497 Commits

Author SHA1 Message Date
Sun Junyi
843cdc2b7c
Merge pull request #582 from hosiet/pr-fix-typo-codespell
Fix typos found by codespell
2018-09-20 10:44:47 +08:00
Sun Junyi
68f2a64f7e
Merge pull request #663 from JimCurryWang/patch-1
Fix  __init__ "-" symbol issue
2018-09-20 10:40:35 +08:00
Sun Junyi
4c8479cfa6
Merge pull request #667 from ZhengZixiang/patch-1
fix the error about importing ChineseAnalyzer
2018-09-20 10:39:29 +08:00
imzhengzx
ca444fb4da
fix the error about imoprting ChineseAnalyzer
Because of the interface change about ChineseAnlayzer , the code 'from jieba.analyse import Chinese Analyzer' in this test file would report an ImportError like 'cannot import name 'ChineseAnalyzer'. Just change import code to 'from jieba.analyse.analyzer import ChineseAnalyzer' can fix it.
2018-09-15 11:59:01 +08:00
CY Wang
36a27302ce
Fix __init__ "-" symbol issue
Solving "-" symbol can't be analyze issue . 

For example,
In keyword , chap-EX喬沛詩 , SK-II  ...etc 
the present version will show "chap", "-", "EX喬沛詩" , "SK", "-", "II"

After the modify,
The new version will show  "chap-EX","喬沛詩" , "SK-II" 

ps: I have used the jieba.load_userdict() , and added  "chap-EX" , "喬沛詩", "SK-II" in the userdict.txt.
2018-08-27 17:05:46 +08:00
Sun Junyi
7653db2e33
Update README.md 2018-07-04 17:18:02 +08:00
Boyuan Yang
17ef8abba3
Fix typos found by codespell 2018-01-21 19:15:48 +08:00
fxsjy
cb0de2973b version change 0.39 v0.39 2017-08-28 21:40:18 +08:00
sunjunyi01
b4dd5b58f3 bug fix, issue: #511, #512 2017-08-28 21:10:50 +08:00
Sun Junyi
4eef868338 Merge pull request #455 from OOCZC/master
Update README.md
2017-04-06 15:22:01 +08:00
OOC
b485ae916c Update README.md 2017-04-04 11:45:53 +08:00
OOC
ee0ce32bbd Update 2017-04-04 11:17:44 +08:00
Sun Junyi
8ba26cf97e Merge pull request #382 from huntzhan/master
Bugfix for HMM=False in parallelism.
2016-08-05 10:02:41 +08:00
huntzhan
60acefd9b1 Bugfix for HMM=False in parallelism. 2016-08-04 17:43:35 +08:00
Sun Junyi
03cd4b5fb6 Merge pull request #367 from yanyiwu/patch-1
Update README.md
2016-06-12 09:37:16 +08:00
Yanyi Wu
76ae798137 Update README.md 2016-06-10 22:48:01 +08:00
Sun Junyi
0243d568e9 Merge pull request #351 from gumblex/master
fix del_word
2016-03-16 10:22:34 +08:00
Dingyuan Wang
12b2b17741 fix del_word 2016-03-15 18:58:12 +08:00
fxsjy
1d5ea9f061 version change 0.38 2015-12-16 16:12:49 +08:00
Sun Junyi
e5c9af78e2 Merge pull request #315 from gumblex/master
命令行分词支持词性标注
2015-11-17 19:13:36 +08:00
Dingyuan Wang
87734d3785 support POS tagging in __main__ 2015-11-17 19:06:44 +08:00
Sun Junyi
3d29b0c8e8 Merge pull request #310 from gumblex/master
Fix compatibility problem with `with` statememt
2015-11-13 14:22:50 +08:00
Dingyuan Wang
1fcd3a417c fix compatibility problem with with statememt 2015-11-13 13:16:19 +08:00
Sun Junyi
093980647b Merge pull request #303 from jerryday/master
add a withFlag param to extract_tags
2015-11-13 10:19:53 +08:00
Sun Junyi
f73a2183a5 Merge pull request #309 from gumblex/master
用 pkg_resources 载入默认字典
2015-11-13 10:18:50 +08:00
Dingyuan Wang
8814e08f9b load default dictionary from pkg_resources and improve the loading method;
change the serialized models from marshal to pickle
2015-11-12 20:18:09 +08:00
Sun Junyi
70f019b669 Merge pull request #307 from gumblex/master
扩充汉字范围;修正 load_userdict
2015-11-09 22:12:23 +08:00
Dingyuan Wang
5270ed66ff fix typo in type detection in load_userdict 2015-11-09 21:37:29 +08:00
Dingyuan Wang
99d0fb1a8a use regex and fix encoding related issues in load_userdict 2015-11-09 20:54:50 +08:00
Dingyuan Wang
1c33252fce change the recognized Chinese character range to [\u4E00-\u9FD5] 2015-11-09 20:23:43 +08:00
jerryday
e5e41a4aad fix pair object in dict problem 2015-10-30 16:38:50 +08:00
jerryday
4f8ca83661 add a withFlag param in textrank 2015-10-30 15:40:41 +08:00
jerryday
26e339f8f7 add a withFlag param to extract_tags 2015-10-30 11:09:24 +08:00
Sun Junyi
b6f1ce773e Merge pull request #298 from anderscui/master
Add introduction to jieba.NET port.
2015-09-23 06:54:56 +08:00
andersc
343bfe9783 Add introduction to jieba.NET port. 2015-09-22 23:16:02 +08:00
fxsjy
cb414cb861 version update 2015-06-27 16:49:44 +08:00
Sun Junyi
8e99a13aa9 Merge pull request #275 from gumblex/master
防止跨文件系统创建缓存
2015-06-26 23:22:42 +08:00
Dingyuan Wang
d0e68974bf improved doc for tmp_dir and cache_file 2015-06-26 22:24:20 +08:00
Dingyuan Wang
66fe17517d prevent moving across different filesystems at tempfile.mkstemp 2015-06-26 22:12:39 +08:00
Dingyuan Wang
be46ddef9a use shutil.move for all platforms in case of different filesystems 2015-06-26 21:52:53 +08:00
Sun Junyi
17652e764f Merge pull request #271 from gumblex/master
修复 cut_for_search;改善 pair 对象
2015-06-01 18:40:31 +08:00
Dingyuan Wang
ceb5c26be4 fix self.FREQ in cut_for_search; make pair object iterable 2015-06-01 14:36:38 +08:00
Sun Junyi
9f4d9376b0 Merge pull request #269 from gumblex/master
自定义字典允许指定词性同时省略词频
2015-05-24 19:56:51 +08:00
Dingyuan Wang
3b76328f2a allow ignoring word frequency while providing pos tag 2015-05-23 21:51:00 +08:00
Sun Junyi
3ec4c43788 Merge pull request #260 from gumblex/master
使用类包装全局函数
2015-05-11 10:24:49 +08:00
Dingyuan Wang
94840a734c wraps most globals in classes
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default

Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
2015-05-09 21:29:05 +08:00
Sun Junyi
e359d08964 Merge pull request #257 from gip0/gip0-patch-1
fixed an error in load_userdict()
2015-05-02 17:27:16 +08:00
Gilbert Liu
f6e57ab2ae fixed an error in load_userdict() 2015-05-01 12:52:28 -07:00
Sun Junyi
60f0028175 Merge pull request #252 from fukuball/master
更新 README
2015-04-28 22:42:40 +08:00
Fukuball Lin
e712a4de61 更新 README
增加结巴分词 PHP 版本相關資訊
2015-04-28 22:05:26 +08:00