90 Commits

Author SHA1 Message Date
fxsjy
04eb4f08cf fix a bug of changing dictionary 2013-04-26 16:48:46 +08:00
fxsjy
bc049090a5 make lazy load thread safe 2013-04-26 12:54:05 +08:00
fxsjy
d2460029d5 merge lazy load 2013-04-26 09:57:06 +08:00
Herman Schaaf
c6098a8657 Add initialize function and lazy initialization 2013-04-25 21:04:56 +09:00
fxsjy
47d94a13e6 log(1)==0, since we have changed from PRODUCT to sum of LOG 2013-04-25 10:11:04 +08:00
fxsjy
c350fab2b9 fix wrong line number 2013-04-25 09:28:00 +08:00
fxsjy
65b78b2b4d read() and then split -- faster; from __future__ import with 2013-04-24 22:14:10 +08:00
Neuron Teckid
166c2ca7a5 auto close file; locate error when failing to parse 2013-04-24 19:01:08 +08:00
fxsjy
3f003e2f29 new method: jieba.disable_parallel, which is the inverse operation of jieba.enable_parallel 2013-04-22 12:35:17 +08:00
fxsjy
b46166f768 use CRLF as seperator to make chunks in parallel mode 2013-04-20 18:46:04 +08:00
fxsjy
62cf22121f new feature: parallel segment with multiprocessing 2013-04-20 14:11:31 +08:00
Sun Junyi
6da857b554 merge changes from master branch 2013-04-19 10:21:34 +08:00
Sun Junyi
012fddf13f ignore white space 2013-04-12 22:37:53 +08:00
fxsjy
45591bb9ab support flag '_'; ignore white space 2013-04-12 21:53:03 +08:00
Sun Junyi
c77823aa1d merge improvement to Py3k branch 2013-04-12 14:58:25 +08:00
Sun Junyi
a383f035ba support decimal point: example PI=3.141569 = > PI / = / 3.14159 2013-04-08 09:38:49 +08:00
Sun Junyi
659326c4e1 punctuation; improve keywords extraction 2013-04-06 14:02:11 +08:00
Sun Junyi
8e49199993 keep punctuation marks 2013-04-05 21:48:36 +08:00
Sun Junyi
58c363655c support user defined word tag 2013-03-25 17:28:37 +08:00
Sun Junyi
44e19a2e27 fix bug in pypy 2013-03-22 15:20:19 +08:00
Sun Junyi
0f4f9067c3 fix bugs in jieba for py3k 2013-03-21 11:10:57 +08:00
Sun Junyi
d58402c8f6 for issue 26 2013-02-18 10:31:20 +08:00
Sun Junyi
981d58e106 for issue 26 2013-02-18 10:20:17 +08:00
Sun Junyi
1edc1651ee try to fix this issue: https://github.com/fxsjy/jieba/issues/26 2013-02-17 16:04:51 +08:00
Sun Junyi
fd20cbbd4b use logarithmic addition instead of multiplication, to avoid bad case in issue19 2012-12-28 11:29:51 +08:00
Sun Junyi
379cd4933a support en-chn mixed words, like B超 2012-12-12 11:03:29 +08:00
Sun Junyi
9c07d80edb first py3k version of jieba 2012-11-28 10:50:40 +08:00
Sun Junyi
5ce72e76b1 add new method: cut_for_search(sentence), which can get better recall rate for search engine's reverse index 2012-11-27 13:37:40 +08:00
Sun Junyi
80bf2fec30 Merge branch 'master' of https://github.com/fxsjy/jieba 2012-11-23 16:01:25 +08:00
Sun Junyi
400889b25c enhance cut_all=True mode 2012-11-23 15:59:15 +08:00
Felix Yan
085b09c3ea add file-like object support 2012-11-21 18:07:19 +08:00
Sun Junyi
193bfee1d4 use only one dictionary 2012-11-06 11:01:31 +08:00
fxsjy
90cd4b3014 improve POS tagging 2012-11-06 07:17:26 +08:00
Sun Junyi
d040e92987 new interface: load_userdict(file_name) 2012-10-25 17:06:39 +08:00
Sun Junyi
14faea710b use file cache to improve the loading speed after the first time of importing 2012-10-25 12:18:33 +08:00
fxsjy
ef0c0284ff improve speed 2012-10-09 06:37:01 +08:00
fxsjy
9180b90ae3 make model loading more faster 2012-10-06 18:28:52 +08:00
fxsjy
164b782c4e improve the speed 2012-10-04 13:10:56 +08:00
fxsjy
51765aa6dd first commit 2012-10-01 15:25:06 +08:00
Sun Junyi
6f6e812afb first commit 2012-09-29 15:54:04 +08:00