mirror of
https://github.com/fxsjy/jieba.git
synced 2025-07-24 00:00:05 +08:00
Update readme in both languages with new functions
This commit is contained in:
parent
c6098a8657
commit
7342a18534
29
README.md
29
README.md
@ -116,8 +116,18 @@ https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
|
|||||||
2. 支持繁体分词更好的词典文件
|
2. 支持繁体分词更好的词典文件
|
||||||
https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
|
https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
|
||||||
|
|
||||||
下载你所需要的词典,然后覆盖jieba/dict.txt 即可。
|
下载你所需要的词典,然后覆盖jieba/dict.txt 即可或者用`jieba.set_dictionary('data/dict.txt.big')`
|
||||||
|
|
||||||
|
初始化
|
||||||
|
=====
|
||||||
|
默认情况下,jieba采用延迟加载,一旦有必要建立trie。这需要1-3秒一次而以,之后还没有重新初始化。如果你想手工初始jieba,您可以用:
|
||||||
|
|
||||||
|
import jieba
|
||||||
|
jieba.initialize()
|
||||||
|
|
||||||
|
在这一步还可以指定要使用的词典(可选):
|
||||||
|
|
||||||
|
jieba.initialize('data/dict.txt.big')
|
||||||
|
|
||||||
分词速度
|
分词速度
|
||||||
=========
|
=========
|
||||||
@ -233,13 +243,26 @@ Using Other Dictionaries
|
|||||||
========
|
========
|
||||||
It is possible to supply Jieba with your own custom dictionary, and there are also two dictionaries readily available for download:
|
It is possible to supply Jieba with your own custom dictionary, and there are also two dictionaries readily available for download:
|
||||||
|
|
||||||
1. You can employ a smaller dictionary to use less memory:
|
1. You can employ a smaller dictionary for a smaller memory footprint:
|
||||||
https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
|
https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
|
||||||
|
|
||||||
2. There is also a bigger file that has better support for traditional characters (繁體):
|
2. There is also a bigger file that has better support for traditional characters (繁體):
|
||||||
https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
|
https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
|
||||||
|
|
||||||
In either case, download the file you want first, and then call `jieba.load_userdict('dict.txt.small')` or just replace the existing `dict.txt`.
|
By default, an in-between dictionary is used, called `dict.txt` and included in the distribution.
|
||||||
|
|
||||||
|
In either case, download the file you want first, and then call `jieba.set_dictionary('data/dict.txt.big')` or just replace the existing `dict.txt`.
|
||||||
|
|
||||||
|
Initialization
|
||||||
|
========
|
||||||
|
By default, Jieba employs lazy loading to only build the trie once it is necessary. This takes 1-3 seconds once, after which it is not initialized again. If you want to initialize Jieba manually, you can call:
|
||||||
|
|
||||||
|
import jieba
|
||||||
|
jieba.initialize()
|
||||||
|
|
||||||
|
You can also specify the dictionary to use in this step (optional):
|
||||||
|
|
||||||
|
jieba.initialize('data/dict.txt.big')
|
||||||
|
|
||||||
Segmentation speed
|
Segmentation speed
|
||||||
=========
|
=========
|
||||||
|
Loading…
x
Reference in New Issue
Block a user