186 Commits

Author SHA1 Message Date
fxsjy
a9f53e9c85 don't seprate CRLF 2013-06-22 21:56:39 +08:00
fxsjy
c015f4e297 support cxfree py2exe; keep white space 2013-06-22 21:24:45 +08:00
fxsjy
7343679ba8 fix a bug in parallel mode 2013-06-21 15:09:27 +08:00
Sun Junyi
c0816b9bb0 more mixed words 2013-06-18 18:09:55 +08:00
Sun Junyi
c9e8da9e63 add more mix words to dict.txt 2013-06-18 14:10:36 +08:00
Sun Junyi
9d1e23ce6f speed up the viterbi 2013-06-16 13:21:43 +08:00
Sun Junyi
b050bfe946 remove some useless words 2013-06-08 15:40:01 +08:00
fxsjy
be1686654d merge master to jieba3k 2013-06-08 11:18:56 +08:00
fxsjy
e12e176d17 rollback, seems no abvious speed up by the previous change 2013-06-07 15:51:48 +08:00
fxsjy
d3531f197d rollback, seems no abvious speed up by the previous change 2013-06-07 15:51:13 +08:00
fxsjy
f2d6abf063 speed up of viterbi 2013-06-07 14:41:55 +08:00
fxsjy
0087a4e7e3 adjust prob_trans for better support of name entity; fix some bad cases 2013-06-07 13:59:36 +08:00
cloudaice
dfc807e65b Don't lose nformation about a function when using a decorator 2013-05-23 00:25:45 +02:00
Sun Junyi
a8f902545c fix some bad cases 2013-05-15 18:21:08 +08:00
cloudaice
9b0f60df93 Catch明确的错误 2013-05-10 11:26:27 +02:00
cloudaice
8ba8735f46 使用更明确的表达 2013-05-10 11:09:41 +02:00
Sun Junyi
ff4ea5d882 fix a bug of file leak 2013-05-02 11:24:22 +08:00
Sun Junyi
35aa38ed12 fix a bug caused by default argument binding 2013-04-28 12:04:16 +08:00
fxsjy
aae91b6fb6 merge change from master to jieba3k 2013-04-27 16:04:16 +08:00
Sun Junyi
94d455b079 hot fix of cut_all=True 2013-04-27 10:23:01 +08:00
Sun Junyi
59d5d3b811 fix bug and change version 2013-04-27 09:45:39 +08:00
fxsjy
c8df565981 more log trace for trouble shooting 2013-04-26 17:43:24 +08:00
fxsjy
04eb4f08cf fix a bug of changing dictionary 2013-04-26 16:48:46 +08:00
fxsjy
bc049090a5 make lazy load thread safe 2013-04-26 12:54:05 +08:00
fxsjy
d2460029d5 merge lazy load 2013-04-26 09:57:06 +08:00
Herman Schaaf
c6098a8657 Add initialize function and lazy initialization 2013-04-25 21:04:56 +09:00
fxsjy
47d94a13e6 log(1)==0, since we have changed from PRODUCT to sum of LOG 2013-04-25 10:11:04 +08:00
fxsjy
c350fab2b9 fix wrong line number 2013-04-25 09:28:00 +08:00
fxsjy
65b78b2b4d read() and then split -- faster; from __future__ import with 2013-04-24 22:14:10 +08:00
Neuron Teckid
166c2ca7a5 auto close file; locate error when failing to parse 2013-04-24 19:01:08 +08:00
fxsjy
3f003e2f29 new method: jieba.disable_parallel, which is the inverse operation of jieba.enable_parallel 2013-04-22 12:35:17 +08:00
fxsjy
b46166f768 use CRLF as seperator to make chunks in parallel mode 2013-04-20 18:46:04 +08:00
fxsjy
62cf22121f new feature: parallel segment with multiprocessing 2013-04-20 14:11:31 +08:00
Sun Junyi
6da857b554 merge changes from master branch 2013-04-19 10:21:34 +08:00
Sun Junyi
8d89e8afda handle 的 2013-04-19 10:02:33 +08:00
Sun Junyi
012fddf13f ignore white space 2013-04-12 22:37:53 +08:00
fxsjy
45591bb9ab support flag '_'; ignore white space 2013-04-12 21:53:03 +08:00
Sun Junyi
c77823aa1d merge improvement to Py3k branch 2013-04-12 14:58:25 +08:00
Sun Junyi
94ad7e7035 support decimal point 2013-04-08 09:53:04 +08:00
Sun Junyi
72fff6c8e2 support decimal point 2013-04-08 09:40:32 +08:00
Sun Junyi
a383f035ba support decimal point: example PI=3.141569 = > PI / = / 3.14159 2013-04-08 09:38:49 +08:00
fxsjy
600a7fc285 CRLF to LF 2013-04-07 22:30:18 +08:00
fxsjy
ddeb766202 CRLF to LF 2013-04-07 22:29:39 +08:00
fxsjy
6632bb80ec CRLF to LF 2013-04-07 22:27:58 +08:00
fxsjy
f1d5d90ae6 CRLF to LF 2013-04-07 22:27:17 +08:00
Sun Junyi
659326c4e1 punctuation; improve keywords extraction 2013-04-06 14:02:11 +08:00
Sun Junyi
7d227da5c4 punctuation 2013-04-05 22:49:16 +08:00
Sun Junyi
8e49199993 keep punctuation marks 2013-04-05 21:48:36 +08:00
Sun Junyi
58c363655c support user defined word tag 2013-03-25 17:28:37 +08:00
Sun Junyi
44e19a2e27 fix bug in pypy 2013-03-22 15:20:19 +08:00