cppjieba/README.md
2013-08-02 22:01:44 +08:00

23 lines
445 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

cppjieba
========
"结巴"中文分词的cpp版本
"结巴"中文分词详见:
https://github.com/fxsjy/jieba
作案动机
========
个人需求参照python的jieba分词源码写的。
详细
========
1.现在只支持gbk编码的分词。
2.分词算法上还没增加HMM模型这部分。
3.关键词抽取是暂时是针对类似title之类的超短语句使用的基本上没有普适性。
contact
========
wuyanyi09@gmail.com