fxsjy
|
04eb4f08cf
|
fix a bug of changing dictionary
|
2013-04-26 16:48:46 +08:00 |
|
fxsjy
|
bc049090a5
|
make lazy load thread safe
|
2013-04-26 12:54:05 +08:00 |
|
fxsjy
|
d2460029d5
|
merge lazy load
|
2013-04-26 09:57:06 +08:00 |
|
Herman Schaaf
|
c6098a8657
|
Add initialize function and lazy initialization
|
2013-04-25 21:04:56 +09:00 |
|
fxsjy
|
47d94a13e6
|
log(1)==0, since we have changed from PRODUCT to sum of LOG
|
2013-04-25 10:11:04 +08:00 |
|
fxsjy
|
c350fab2b9
|
fix wrong line number
|
2013-04-25 09:28:00 +08:00 |
|
fxsjy
|
65b78b2b4d
|
read() and then split -- faster; from __future__ import with
|
2013-04-24 22:14:10 +08:00 |
|
Neuron Teckid
|
166c2ca7a5
|
auto close file; locate error when failing to parse
|
2013-04-24 19:01:08 +08:00 |
|
fxsjy
|
3f003e2f29
|
new method: jieba.disable_parallel, which is the inverse operation of jieba.enable_parallel
|
2013-04-22 12:35:17 +08:00 |
|
fxsjy
|
b46166f768
|
use CRLF as seperator to make chunks in parallel mode
|
2013-04-20 18:46:04 +08:00 |
|
fxsjy
|
62cf22121f
|
new feature: parallel segment with multiprocessing
|
2013-04-20 14:11:31 +08:00 |
|
Sun Junyi
|
6da857b554
|
merge changes from master branch
|
2013-04-19 10:21:34 +08:00 |
|
Sun Junyi
|
012fddf13f
|
ignore white space
|
2013-04-12 22:37:53 +08:00 |
|
fxsjy
|
45591bb9ab
|
support flag '_'; ignore white space
|
2013-04-12 21:53:03 +08:00 |
|
Sun Junyi
|
c77823aa1d
|
merge improvement to Py3k branch
|
2013-04-12 14:58:25 +08:00 |
|
Sun Junyi
|
a383f035ba
|
support decimal point: example PI=3.141569 = > PI / = / 3.14159
|
2013-04-08 09:38:49 +08:00 |
|
Sun Junyi
|
659326c4e1
|
punctuation; improve keywords extraction
|
2013-04-06 14:02:11 +08:00 |
|
Sun Junyi
|
8e49199993
|
keep punctuation marks
|
2013-04-05 21:48:36 +08:00 |
|
Sun Junyi
|
58c363655c
|
support user defined word tag
|
2013-03-25 17:28:37 +08:00 |
|
Sun Junyi
|
44e19a2e27
|
fix bug in pypy
|
2013-03-22 15:20:19 +08:00 |
|
Sun Junyi
|
0f4f9067c3
|
fix bugs in jieba for py3k
|
2013-03-21 11:10:57 +08:00 |
|
Sun Junyi
|
d58402c8f6
|
for issue 26
|
2013-02-18 10:31:20 +08:00 |
|
Sun Junyi
|
981d58e106
|
for issue 26
|
2013-02-18 10:20:17 +08:00 |
|
Sun Junyi
|
1edc1651ee
|
try to fix this issue: https://github.com/fxsjy/jieba/issues/26
|
2013-02-17 16:04:51 +08:00 |
|
Sun Junyi
|
fd20cbbd4b
|
use logarithmic addition instead of multiplication, to avoid bad case in issue19
|
2012-12-28 11:29:51 +08:00 |
|
Sun Junyi
|
379cd4933a
|
support en-chn mixed words, like B超
|
2012-12-12 11:03:29 +08:00 |
|
Sun Junyi
|
9c07d80edb
|
first py3k version of jieba
|
2012-11-28 10:50:40 +08:00 |
|
Sun Junyi
|
5ce72e76b1
|
add new method: cut_for_search(sentence), which can get better recall rate for search engine's reverse index
|
2012-11-27 13:37:40 +08:00 |
|
Sun Junyi
|
80bf2fec30
|
Merge branch 'master' of https://github.com/fxsjy/jieba
|
2012-11-23 16:01:25 +08:00 |
|
Sun Junyi
|
400889b25c
|
enhance cut_all=True mode
|
2012-11-23 15:59:15 +08:00 |
|
Felix Yan
|
085b09c3ea
|
add file-like object support
|
2012-11-21 18:07:19 +08:00 |
|
Sun Junyi
|
193bfee1d4
|
use only one dictionary
|
2012-11-06 11:01:31 +08:00 |
|
fxsjy
|
90cd4b3014
|
improve POS tagging
|
2012-11-06 07:17:26 +08:00 |
|
Sun Junyi
|
d040e92987
|
new interface: load_userdict(file_name)
|
2012-10-25 17:06:39 +08:00 |
|
Sun Junyi
|
14faea710b
|
use file cache to improve the loading speed after the first time of importing
|
2012-10-25 12:18:33 +08:00 |
|
fxsjy
|
ef0c0284ff
|
improve speed
|
2012-10-09 06:37:01 +08:00 |
|
fxsjy
|
9180b90ae3
|
make model loading more faster
|
2012-10-06 18:28:52 +08:00 |
|
fxsjy
|
164b782c4e
|
improve the speed
|
2012-10-04 13:10:56 +08:00 |
|
fxsjy
|
51765aa6dd
|
first commit
|
2012-10-01 15:25:06 +08:00 |
|
Sun Junyi
|
6f6e812afb
|
first commit
|
2012-09-29 15:54:04 +08:00 |
|