42 Commits

Author SHA1 Message Date
Sun Junyi
c9e8da9e63 add more mix words to dict.txt 2013-06-18 14:10:36 +08:00
fxsjy
0087a4e7e3 adjust prob_trans for better support of name entity; fix some bad cases 2013-06-07 13:59:36 +08:00
Sun Junyi
4300f79788 add a example of using sklearn+jieba 2013-05-17 09:35:12 +08:00
Sun Junyi
a8f902545c fix some bad cases 2013-05-15 18:21:08 +08:00
cloudaice
9ee20a5293 add generator test 2013-05-11 22:50:30 +02:00
cloudaice
0c050b5eb2 add jieba.posseg test case 2013-05-11 17:40:43 +02:00
cloudaice
b0f9e6721e 添加cutall 测试用例 2013-05-11 17:40:43 +02:00
cloudaice
a7ff398edc 添加cut,set_dictionary,cut_for_search三个测试用例 2013-05-11 17:40:43 +02:00
cloudaice
667203a9ae 替换tab为空格,使用join代替循环 2013-05-11 17:40:43 +02:00
cloudaice
a2d2078465 将tab换成空格,使用is判断对象是否为None 2013-05-11 17:40:42 +02:00
cloudaice
e0434871eb 修改demo.py的代码格式,使得符合pep8规范 2013-05-11 17:40:42 +02:00
Sun Junyi
c1bf815343 update test case 2013-05-02 17:01:16 +08:00
Sun Junyi
94d455b079 hot fix of cut_all=True 2013-04-27 10:23:01 +08:00
Sun Junyi
59d5d3b811 fix bug and change version 2013-04-27 09:45:39 +08:00
fxsjy
8666428fb0 fix a bug of changing dictionary 2013-04-26 16:47:00 +08:00
fxsjy
9bebe6120b utf-8 output is more friendly to Linux 2013-04-26 16:19:00 +08:00
Sun Junyi
d3339633d5 in the speed test: initialize first to ignore the time of dict loading 2013-04-26 14:51:58 +08:00
fxsjy
bc049090a5 make lazy load thread safe 2013-04-26 12:54:05 +08:00
fxsjy
b46166f768 use CRLF as seperator to make chunks in parallel mode 2013-04-20 18:46:04 +08:00
fxsjy
6b83593b5a rm stub.log 2013-04-20 14:13:10 +08:00
fxsjy
62cf22121f new feature: parallel segment with multiprocessing 2013-04-20 14:11:31 +08:00
Sun Junyi
8d89e8afda handle 的 2013-04-19 10:02:33 +08:00
fxsjy
45591bb9ab support flag '_'; ignore white space 2013-04-12 21:53:03 +08:00
Sun Junyi
94ad7e7035 support decimal point 2013-04-08 09:53:04 +08:00
Sun Junyi
a383f035ba support decimal point: example PI=3.141569 = > PI / = / 3.14159 2013-04-08 09:38:49 +08:00
Sun Junyi
8e49199993 keep punctuation marks 2013-04-05 21:48:36 +08:00
Sun Junyi
58c363655c support user defined word tag 2013-03-25 17:28:37 +08:00
Sun Junyi
6cc0e95759 rm 1.log 2013-03-22 15:19:57 +08:00
Sun Junyi
d2634a049b fix a bug in pypy 2013-03-22 15:16:47 +08:00
Sun Junyi
06ebc6f71c en-chn mix words in POS 2012-12-12 14:24:44 +08:00
Sun Junyi
a8ae0398b4 add one example 2012-12-12 13:40:22 +08:00
Sun Junyi
6517119110 remove 1.log 2012-12-12 11:04:35 +08:00
Sun Junyi
8c05efed68 remove tlbb.txt 2012-12-12 11:04:19 +08:00
Sun Junyi
379cd4933a support en-chn mixed words, like B超 2012-12-12 11:03:29 +08:00
Sun Junyi
e0bd9a6a50 version chage; doc update 2012-11-27 14:06:46 +08:00
Sun Junyi
176c49d15c remove some files 2012-11-06 10:32:34 +08:00
Sun Junyi
59c3efeb2f improve speed of tagging 2012-11-06 10:32:00 +08:00
fxsjy
1a2a64a13f one more example of POS tagging 2012-11-06 07:44:39 +08:00
fxsjy
90cd4b3014 improve POS tagging 2012-11-06 07:17:26 +08:00
Sun Junyi
15a5a2d50e add a sample script about tags extraction 2012-10-16 13:25:35 +08:00
fxsjy
64b3c0d0e0 add one more example 2012-10-06 14:50:10 +08:00
fxsjy
d2bee13d9d add setup.py 2012-10-01 16:53:26 +08:00