cppjieba/README.md
2013-08-20 21:17:54 +08:00

33 lines
566 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#CppJieba
>"结巴"中文分词的cpp版本
>"结巴"中文分词详见:https://github.com/fxsjy/jieba
#Detail
>1.现在支持utf8,gbk编码的分词。默认编码是utf8。
>2.分词算法上还没增加HMM模型这部分。
>3.关键词抽取是暂时是针对类似title之类的超短语句使用的基本上没有普适性。
#Demo
## Segment's demo
```
cd ./demo;
make;
./segment_demo testlines.gbk
```
run `./segment_demo` to get help.
#Contact
wuyanyi09@gmail.com
#Thanks
>"结巴中文"分词作者: SunJunyi
>https://github.com/fxsjy/jieba