Update readme in both languages with new functions

This commit is contained in:
Herman Schaaf 2013-04-25 21:46:15 +09:00
parent c6098a8657
commit 7342a18534

View File

@ -116,8 +116,18 @@ https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
2. 支持繁体分词更好的词典文件 2. 支持繁体分词更好的词典文件
https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
下载你所需要的词典然后覆盖jieba/dict.txt 即可 下载你所需要的词典然后覆盖jieba/dict.txt 即可或者用`jieba.set_dictionary('data/dict.txt.big')`
初始化
=====
默认情况下jieba采用延迟加载一旦有必要建立trie。这需要1-3秒一次而以之后还没有重新初始化。如果你想手工初始jieba您可以用
import jieba
jieba.initialize()
在这一步还可以指定要使用的词典(可选):
jieba.initialize('data/dict.txt.big')
分词速度 分词速度
========= =========
@ -233,13 +243,26 @@ Using Other Dictionaries
======== ========
It is possible to supply Jieba with your own custom dictionary, and there are also two dictionaries readily available for download: It is possible to supply Jieba with your own custom dictionary, and there are also two dictionaries readily available for download:
1. You can employ a smaller dictionary to use less memory: 1. You can employ a smaller dictionary for a smaller memory footprint:
https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.small
2. There is also a bigger file that has better support for traditional characters (繁體): 2. There is also a bigger file that has better support for traditional characters (繁體):
https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big
In either case, download the file you want first, and then call `jieba.load_userdict('dict.txt.small')` or just replace the existing `dict.txt`. By default, an in-between dictionary is used, called `dict.txt` and included in the distribution.
In either case, download the file you want first, and then call `jieba.set_dictionary('data/dict.txt.big')` or just replace the existing `dict.txt`.
Initialization
========
By default, Jieba employs lazy loading to only build the trie once it is necessary. This takes 1-3 seconds once, after which it is not initialized again. If you want to initialize Jieba manually, you can call:
import jieba
jieba.initialize()
You can also specify the dictionary to use in this step (optional):
jieba.initialize('data/dict.txt.big')
Segmentation speed Segmentation speed
========= =========