fxsjy
5704e23bbf
update version: 0.42
2020-01-13 21:24:45 +08:00
fxsjy
aa65031788
fix file mode
2020-01-13 21:03:38 +08:00
fxsjy
2eb11c8028
fix issue #810
2020-01-13 20:53:43 +08:00
JesseyXujin
d703bce302
paddle coredump exception fix ( #807 )
...
* paddle_null_point_fix
* add core expception note
* delete yield
* modify test paddle for supporting enable_paddle()
2020-01-10 16:30:46 +08:00
vissssa
dc2b788eb3
refactor: improvement check_paddle_installed ( #806 )
2020-01-09 19:23:11 +08:00
fxsjy
0868c323d9
update version in __init__.py
2020-01-08 16:21:07 +08:00
fxsjy
eb37e048da
update version to 0.41
2020-01-08 16:04:30 +08:00
JesseyXujin
381b0691ac
Add enable_paddle interface to install paddle and import packages ( #802 )
...
* enable_paddle_interface
* Add enable_paddle interface to install paddle and import packages
* Add enable_paddle interface to install paddle and import packages
* add posseg lcut for paddle mode
* fix vocabulary
2020-01-08 15:26:12 +08:00
fxsjy
97c32464e1
fix issue #798
2020-01-03 14:10:48 +08:00
Tim Gates
0489a6979e
Fix simple typo: vocabuary -> vocabulary ( #797 )
...
Closes #796
2020-01-02 10:26:00 +08:00
JesseyXujin
30ea8f929e
Simplify Paddle import check ( #795 )
2019-12-31 15:03:14 +08:00
JesseyXujin
0b74b6c2de
add jieba upgrade not in README.md and change import imp to import importlib in _compat.py ( #794 )
2019-12-31 14:14:50 +08:00
JesseyXujin
17bab6a2d1
修改paddle版本检测报错机制 ( #790 )
2019-12-25 19:46:49 +08:00
fxsjy
d47e14e5b3
update version
2019-12-25 10:34:18 +08:00
pkpk
27910094ac
Fix bugs in Paddle seg and Paddle postag ( #789 )
...
* fix bugs in paddle seg and paddle postag
* fix compat in checking paddle
2019-12-24 21:02:55 +08:00
fxsjy
478c3b9bb4
lazy import paddle
2019-12-24 19:19:51 +08:00
JesseyXujin
5b3bb4b7f2
加入paddle分词和词性标注功能 ( #788 )
...
* paddle cut release
* 修改README.md,提示用户安装paddlepaddle.tiny
* 删除两个init.py文件中utf头文件
* 修改readme细节
2019-12-24 17:27:41 +08:00
Hongxiang Lin
38134ee20f
修复suggest_freq中add_word指向的bug ( #723 )
2019-07-01 19:43:45 +08:00
Sun Junyi
843cdc2b7c
Merge pull request #582 from hosiet/pr-fix-typo-codespell
...
Fix typos found by codespell
2018-09-20 10:44:47 +08:00
CY Wang
36a27302ce
Fix __init__ "-" symbol issue
...
Solving "-" symbol can't be analyze issue .
For example,
In keyword , chap-EX喬沛詩 , SK-II ...etc
the present version will show "chap", "-", "EX喬沛詩" , "SK", "-", "II"
After the modify,
The new version will show "chap-EX","喬沛詩" , "SK-II"
ps: I have used the jieba.load_userdict() , and added "chap-EX" , "喬沛詩", "SK-II" in the userdict.txt.
2018-08-27 17:05:46 +08:00
Boyuan Yang
17ef8abba3
Fix typos found by codespell
2018-01-21 19:15:48 +08:00
fxsjy
cb0de2973b
version change 0.39
2017-08-28 21:40:18 +08:00
sunjunyi01
b4dd5b58f3
bug fix, issue: #511 , #512
2017-08-28 21:10:50 +08:00
huntzhan
60acefd9b1
Bugfix for HMM=False in parallelism.
2016-08-04 17:43:35 +08:00
Dingyuan Wang
12b2b17741
fix del_word
2016-03-15 18:58:12 +08:00
fxsjy
1d5ea9f061
version change 0.38
2015-12-16 16:12:49 +08:00
Dingyuan Wang
87734d3785
support POS tagging in __main__
2015-11-17 19:06:44 +08:00
Sun Junyi
3d29b0c8e8
Merge pull request #310 from gumblex/master
...
Fix compatibility problem with `with` statememt
2015-11-13 14:22:50 +08:00
Dingyuan Wang
1fcd3a417c
fix compatibility problem with with
statememt
2015-11-13 13:16:19 +08:00
Sun Junyi
093980647b
Merge pull request #303 from jerryday/master
...
add a withFlag param to extract_tags
2015-11-13 10:19:53 +08:00
Dingyuan Wang
8814e08f9b
load default dictionary from pkg_resources and improve the loading method;
...
change the serialized models from marshal to pickle
2015-11-12 20:18:09 +08:00
Dingyuan Wang
5270ed66ff
fix typo in type detection in load_userdict
2015-11-09 21:37:29 +08:00
Dingyuan Wang
99d0fb1a8a
use regex and fix encoding related issues in load_userdict
2015-11-09 20:54:50 +08:00
Dingyuan Wang
1c33252fce
change the recognized Chinese character range to [\u4E00-\u9FD5]
2015-11-09 20:23:43 +08:00
jerryday
e5e41a4aad
fix pair object in dict problem
2015-10-30 16:38:50 +08:00
jerryday
4f8ca83661
add a withFlag param in textrank
2015-10-30 15:40:41 +08:00
jerryday
26e339f8f7
add a withFlag param to extract_tags
2015-10-30 11:09:24 +08:00
fxsjy
cb414cb861
version update
2015-06-27 16:49:44 +08:00
Dingyuan Wang
66fe17517d
prevent moving across different filesystems at tempfile.mkstemp
2015-06-26 22:12:39 +08:00
Dingyuan Wang
be46ddef9a
use shutil.move for all platforms in case of different filesystems
2015-06-26 21:52:53 +08:00
Dingyuan Wang
ceb5c26be4
fix self.FREQ in cut_for_search; make pair object iterable
2015-06-01 14:36:38 +08:00
Dingyuan Wang
3b76328f2a
allow ignoring word frequency while providing pos tag
2015-05-23 21:51:00 +08:00
Dingyuan Wang
94840a734c
wraps most globals in classes
...
API changes:
* class jieba.Tokenizer, jieba.posseg.POSTokenizer
* class jieba.analyse.TFIDF, jieba.analyse.TextRank
* global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer
* multiprocessing only works with jieba.(posseg.)dt
* new lcut, lcut_for_search functions that returns a list
* jieba.analyse.textrank now returns 20 items by default
Tests:
* added test_lock.py to test multithread locking
* demo.py now contains most of the examples in README
2015-05-09 21:29:05 +08:00
Gilbert Liu
f6e57ab2ae
fixed an error in load_userdict()
2015-05-01 12:52:28 -07:00
fxsjy
29d2b838dc
a minor version on pypi, which removes *.pyc
2015-04-17 19:35:12 +08:00
Wang Bin
84ffa0d4bf
exlucde word fragments from FREQ
2015-04-02 11:06:55 +08:00
fxsjy
89481cfd84
version update 0.36
2015-03-20 11:00:55 +08:00
Dingyuan Wang
4a552ca94f
suggest word frequency, support passing str to add_word
2015-03-14 12:44:19 +08:00
Yuan-Yi Chang
62433a3205
讓 jieba 可以自行指定 cache_file 產生的目錄位置,提供 jieba 在 Read-only file system 環境中運行
...
1.在呼叫 jieba.cut() 等相關動作前,先透過 jieba.tmp_dir 指定目錄位置
2.當應用環境為 Read-Only File System,可透過預先產生 cache_file 的機制,讓 jieba 正常運行
3.實際案例為 Google App Engine 和 Heroku,其中前者免費版僅 128MB 記憶體空間無法運行,後者免費環境有 512MB 可正常運行。發佈前,先在本地端產生 cache_file 後,連同 cache_file 一併發佈至 Google App Engine 或 Heroku 環境上即可使用。
2015-02-27 17:25:59 +08:00
Dingyuan Wang
f29430f49e
details in textrank; update README
2015-02-16 21:25:55 +08:00