mirror of
https://github.com/fxsjy/jieba.git
synced 2025-07-10 00:01:33 +08:00
Merge pull request #191 from sing1ee/master
add some introduction of textrank
This commit is contained in:
commit
0049b0c5b4
29
README.md
29
README.md
@ -153,6 +153,35 @@ Output:
|
||||
|
||||
* 用法示例:https://github.com/fxsjy/jieba/blob/master/test/extract_tags_with_weight.py
|
||||
|
||||
####基于TextRank算法的关键词抽取实现
|
||||
[算法论文链接](http://digital.library.unt.edu/ark:/67531/metadc30962/m2/1/high_res_d/Mihalcea-2004-TextRank-Bringing_Order_into_Texts.pdf)
|
||||
|
||||
#####基本思想:
|
||||
|
||||
1. 将待抽取关键词的文本进行分词
|
||||
1. 以固定窗口大小(我选的5,可适当调整),词之间的共现关系,构建图
|
||||
1. 计算图中节点的PageRank,注意是无向带权图
|
||||
|
||||
#####基本使用:
|
||||
1. jieba.analyse.textrank.textrank(raw_text),注意导入包的路径。
|
||||
|
||||
#####示例结果:
|
||||
来自__main__的示例结果:
|
||||
|
||||
```shell
|
||||
吉林 100.0
|
||||
欧亚 86.4592606421
|
||||
置业 55.3262889963
|
||||
实现 52.0353476663
|
||||
收入 37.9475518129
|
||||
增资 35.5042189944
|
||||
子公司 34.9286032861
|
||||
全资 30.8154823412
|
||||
城市 30.6031961172
|
||||
商业 30.4779050167
|
||||
|
||||
```
|
||||
|
||||
功能 4) : 词性标注
|
||||
================
|
||||
* 标注句子分词后每个词的词性,采用和 ictclas 兼容的标记法
|
||||
|
Loading…
x
Reference in New Issue
Block a user