114 Commits

Author SHA1 Message Date
Dingyuan Wang
f2b7183a71 use str.splitlines to avoid losing line breaks 2015-02-12 12:39:14 +08:00
Dingyuan Wang
f808ea0ebb use only one dict to store words and prefixes 2015-02-12 10:31:52 +08:00
Dingyuan Wang
32a0e92a09 don't compile re every time; autopep8 2015-02-10 21:22:34 +08:00
Dingyuan Wang
22bcf8be7a Merge master and jieba3k, make the code Python 2/3 compatible 2015-02-10 20:54:55 +08:00
Dingyuan Wang
4197dfb8fa store int directly in FREQ; small improvements 2015-02-09 16:26:00 +08:00
Dingyuan Wang
765fd6b7f0 store int directly in FREQ; small improvements 2015-02-09 16:14:12 +08:00
Dingyuan Wang
c6b386f65b update jieba3k 2014-11-29 16:06:20 +08:00
Dingyuan Wang
7b7c6955a9 complete the setup.py, fix #202 problem in posseg 2014-11-29 15:33:42 +08:00
Nomaka
9cb76dd8b9 Update __init__.py
calc的idx参数没用
2014-11-18 16:00:49 +08:00
fxsjy
447c1ded8c fix problem for python3.2 2014-11-15 13:44:30 +08:00
Dingyuan Wang
7a6caa0c3c port extract_tags, etc to jieba3k; add auto2to3 script 2014-11-07 23:33:31 +08:00
Dingyuan Wang
e3f3dcccba improve the loading and caching process 2014-10-31 21:56:09 +08:00
fxsjy
ba87fcb01f remove trie, use prefix set instead 2014-10-20 14:08:09 +08:00
fxsjy
82bfffb6ed version update to 0.34 2014-10-20 13:35:13 +08:00
Dingyuan Wang
b367690eeb use prefix dict instead of trie, add a command line interface, and a few small improvements 2014-10-19 10:32:23 +08:00
Dingyuan Wang
51df77831b use prefix dict instead of trie, add a command line interface, and a few small improvements 2014-10-18 22:23:26 +08:00
Dingyuan Wang
626b415152 fix dict.itervalues mistake 2014-09-07 19:21:13 +08:00
Dingyuan Wang
6a3f228c72 fix python3 stuff 2014-09-07 18:50:10 +08:00
Dingyuan Wang
6fad5fbb2c update to v0.33 2014-09-06 23:28:47 +08:00
Fukuball Lin
7198d562f1 讓 jieba 可以切換 idf 語料庫
1. 新增繁體中文 idf 語料庫
2. 為了讓 jieba 可以切換 iff 語料庫,新增 get_idf, set_idf_path 方法,並改寫 extract_tags
3. test 增加 extract_tags_idfpath
2014-08-05 22:55:13 +08:00
Dingyuan Wang
c04ccd0d12 Update to v0.32 according to the master branch. 2014-06-14 22:31:13 +08:00
Dingyuan Wang
81f77d7a08 Fix the re in enable_parallel. 2014-06-14 15:22:13 +08:00
davidlihm
5b2ec920ed Update __init__.py 2014-05-15 07:55:11 +08:00
jagt
7f3513edb7 close cache file to avoid warning message. 2014-04-24 00:35:09 +08:00
wind
7488b114e7 use logging instead of print in init file 2014-03-20 13:48:33 +13:00
Sun Junyi
3e430e9769 Update __init__.py 2014-02-16 20:09:57 +08:00
fxsjy
5e6a2c4661 fix a bug of add_word 2013-12-05 13:35:40 +08:00
fxsjy
136676381a fix a bug of add_word 2013-12-05 13:33:24 +08:00
Herman Schaaf
95286b8887 Fix typo in error message 2013-10-21 22:21:09 +09:00
fxsjy
759e1029c8 add an API to control log level: jieba.setLogLevel 2013-09-22 10:26:33 +08:00
Mozillazg
1cf3f0d00b use logging instead of print 2013-09-19 10:31:44 +08:00
Sun Junyi
7e7fcc1184 add an option to disable HMM 2013-09-05 17:09:27 +08:00
fxsjy
c5bd9773d1 fix bug in issue #103 2013-08-30 18:26:53 +08:00
ZoeyYoung
dce353f88b merge from master 2013-08-21 15:32:46 +08:00
ZoeyYoung
2857ae45cc Merge branch 'master' into jieba3k
Conflicts:
	Changelog
	jieba/__init__.py
	jieba/finalseg/__init__.py
	jieba/posseg/__init__.py
	setup.py
	test/parallel/test_file.py
	test/test_file.py
2013-08-21 13:55:21 +08:00
gwdwyy
cc81135429 sed -i 's/not \(.*\) in/\1 not in/g' ... 2013-08-20 20:08:03 +08:00
Sun Junyi
90ab511deb fix the bug about issue: #92 2013-08-09 13:59:02 +08:00
fxsjy
b77645b3aa modify test_file.py; use less memory 2013-07-29 10:17:39 +08:00
fxsjy
ed1fa64e27 fix a bug. use sys.version_info.major can't be used in Python2.5 2013-07-29 10:07:55 +08:00
Sun Junyi
0f972df0ac raise exception in case of lower version 2013-07-29 10:01:47 +08:00
Sun Junyi
e68bb5a28e fix a compatibility problem;python2.5 has no 'multiprocessing'; 2013-07-29 09:57:09 +08:00
Sun Junyi
689e27280a Merge branch 'master' of https://github.com/fxsjy/jieba 2013-07-29 09:49:10 +08:00
Sun Junyi
9d87e798fd 0.31 release 2013-07-29 09:48:53 +08:00
Linker Lin
1dbc525dff 自动检测CPU数目,启动合适数目的进程。 2013-07-28 00:10:27 +08:00
Sun Junyi
6549deabbd merge change from master 2013-07-16 11:06:41 +08:00
Sun Junyi
d63140fe5e make a serial white spaces seperated 2013-07-10 17:27:47 +08:00
Richard Wong
c2ded83ead Refactor: fix line indent to 4.
* jieba/__init__.py (cut):
2013-07-10 16:22:49 +08:00
Richard Wong
99d2492d67 Add re.U flag to re variable. 2013-07-10 16:22:17 +08:00
Richard Wong
fbfaac2eaa Reindent function
* jieba/__init__.py (require_initialized):
2013-07-08 13:54:36 +08:00
Richard Wong
7bfd432fc5 Remove the unused imports. 2013-07-08 13:51:39 +08:00