diff --git a/README.md b/README.md
index fd31119..8c0d764 100644
--- a/README.md
+++ b/README.md
@@ -45,17 +45,19 @@ http://jiebademo.ap01.aws.af.cm/
 
 主要功能
 =======
-1) ：分词
+1. 分词
 --------
 * `jieba.cut` 方法接受三个输入参数: 需要分词的字符串；cut_all 参数用来控制是否采用全模式；HMM 参数用来控制是否使用 HMM 模型
 * `jieba.cut_for_search` 方法接受两个参数：需要分词的字符串；是否使用 HMM 模型。该方法适合用于搜索引擎构建倒排索引的分词，粒度比较细
 * 待分词的字符串可以是 unicode 或 UTF-8 字符串、GBK 字符串。注意：不建议直接输入 GBK 字符串，可能无法预料地错误解码成 UTF-8
-* `jieba.cut` 以及 `jieba.cut_for_search` 返回的结构都是一个可迭代的 generator，可以使用 for 循环来获得分词后得到的每一个词语(unicode)，也可以用 list(jieba.cut(...)) 转化为 list
+* `jieba.cut` 以及 `jieba.cut_for_search` 返回的结构都是一个可迭代的 generator，可以使用 for 循环来获得分词后得到的每一个词语(unicode)，或者用
+* `jieba.lcut` 以及 `jieba.lcut_for_search` 直接返回 list
+* `jieba.Tokenizer(dictionary=DEFAULT_DICT)` 新建自定义分词器，可用于同时使用不同词典。`jieba.dt` 为默认分词器，所有全局分词相关函数都是该分词器的映射。
 
-代码示例( 分词 )
+代码示例
 
 ```python
-#encoding=utf-8
+# encoding=utf-8
 import jieba
 
 seg_list = jieba.cut("我来到北京清华大学", cut_all=True)
@@ -81,7 +83,7 @@ print(", ".join(seg_list))
 
     【搜索引擎模式】： 小明, 硕士, 毕业, 于, 中国, 科学, 学院, 科学院, 中国科学院, 计算, 计算所, 后, 在, 日本, 京都, 大学, 日本京都大学, 深造
 
-2) ：添加自定义词典
+2. 添加自定义词典
 ----------------
 
 ### 载入词典
@@ -91,6 +93,8 @@ print(", ".join(seg_list))
 * 词典格式和`dict.txt`一样，一个词占一行；每一行分三部分，一部分为词语，另一部分为词频（可省略），最后为词性（可省略），用空格隔开
 * 词频可省略，使用计算出的能保证分出该词的词频
 
+* 更改分词器的 tmp_dir 和 cache_file 属性，可指定缓存文件位置，用于受限的文件系统。
+
 * 范例：
 
     * 自定义词典：https://github.com/fxsjy/jieba/blob/master/test/userdict.txt
@@ -128,12 +132,18 @@ print(", ".join(seg_list))
 
 * "通过用户自定义词典来增强歧义纠错能力" --- https://github.com/fxsjy/jieba/issues/14
 
-3) ：关键词提取
+3. 关键词提取
 -------------
-* jieba.analyse.extract_tags(sentence,topK,withWeight) #需要先 `import jieba.analyse`
-* sentence 为待提取的文本
-* topK 为返回几个 TF/IDF 权重最大的关键词，默认值为 20
-* withWeight 为是否一并返回关键词权重值，默认值为 False
+### 基于 TF-IDF 算法的关键词抽取
+
+`import jieba.analyse`
+
+* jieba.analyse.extract_tags(sentence, topK=20, withWeight=False, allowPOS=())
+  * sentence 为待提取的文本
+  * topK 为返回几个 TF/IDF 权重最大的关键词，默认值为 20
+  * withWeight 为是否一并返回关键词权重值，默认值为 False
+  * allowPOS 仅包括指定词性的词，默认值为空，即不筛选
+* jieba.analyse.TFIDF(idf_path=None) 新建 TFIDF 实例，idf_path 为 IDF 频率文件
 
 代码示例 （关键词提取）
 
@@ -155,37 +165,27 @@ https://github.com/fxsjy/jieba/blob/master/test/extract_tags.py
 
 * 用法示例：https://github.com/fxsjy/jieba/blob/master/test/extract_tags_with_weight.py
 
-#### 基于TextRank算法的关键词抽取实现
+### 基于 TextRank 算法的关键词抽取
+
+* jieba.analyse.textrank(sentence, topK=20, withWeight=False, allowPOS=('ns', 'n', 'vn', 'v')) 直接使用，接口相同，注意默认过滤词性。
+* jieba.analyse.TextRank() 新建自定义 TextRank 实例
+
 算法论文： [TextRank: Bringing Order into Texts](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf)
 
-##### 基本思想:
+#### 基本思想:
 
 1. 将待抽取关键词的文本进行分词
-2. 以固定窗口大小(我选的5，可适当调整)，词之间的共现关系，构建图
+2. 以固定窗口大小(默认为5，通过span属性调整)，词之间的共现关系，构建图
 3. 计算图中节点的PageRank，注意是无向带权图
 
-##### 基本使用:
-jieba.analyse.textrank(raw_text)
+#### 使用示例:
 
-##### 示例结果:
-来自`__main__`的示例结果：
+见 [test/demo.py](https://github.com/fxsjy/jieba/blob/master/test/demo.py)
 
-```
-吉林 1.0
-欧亚 0.864834432786
-置业 0.553465925497
-实现 0.520660869531
-收入 0.379699688954
-增资 0.355086023683
-子公司 0.349758490263
-全资 0.308537396283
-城市 0.306103738053
-商业 0.304837414946
-```
-
-4) : 词性标注
+4. 词性标注
 -----------
-* 标注句子分词后每个词的词性，采用和 ictclas 兼容的标记法
+* `jieba.posseg.POSTokenizer(tokenizer=None)` 新建自定义分词器，`tokenizer` 参数可指定内部使用的 `jieba.Tokenizer` 分词器。`jieba.posseg.dt` 为默认词性标注分词器。
+* 标注句子分词后每个词的词性，采用和 ictclas 兼容的标记法。
 * 用法示例
 
 ```pycon
@@ -200,10 +200,10 @@ jieba.analyse.textrank(raw_text)
 天安门 ns
 ```
 
-5) : 并行分词
+5. 并行分词
 -----------
-* 原理：将目标文本按行分隔后，把各行文本分配到多个 python 进程并行分词，然后归并结果，从而获得分词速度的可观提升
-* 基于 python 自带的 multiprocessing 模块，目前暂不支持 windows
+* 原理：将目标文本按行分隔后，把各行文本分配到多个 Python 进程并行分词，然后归并结果，从而获得分词速度的可观提升
+* 基于 python 自带的 multiprocessing 模块，目前暂不支持 Windows
 * 用法：
     * `jieba.enable_parallel(4)` # 开启并行分词模式，参数为并行进程数
     * `jieba.disable_parallel()` # 关闭并行分词模式
@@ -212,8 +212,9 @@ jieba.analyse.textrank(raw_text)
 
 * 实验结果：在 4 核 3.4GHz Linux 机器上，对金庸全集进行精确分词，获得了 1MB/s 的速度，是单进程版的 3.3 倍。
 
+* **注意**：并行分词仅支持默认分词器 `jieba.dt` 和 `jieba.posseg.dt`。
 
-6) : Tokenize：返回词语在原文的起始位置
+6. Tokenize：返回词语在原文的起止位置
 ----------------------------------
 * 注意，输入参数只接受 unicode
 * 默认模式
@@ -235,7 +236,7 @@ word 有限公司            start: 6                end:10
 * 搜索模式
 
 ```python
-result = jieba.tokenize(u'永和服装饰品有限公司',mode='search')
+result = jieba.tokenize(u'永和服装饰品有限公司', mode='search')
 for tk in result:
     print("word %s\t\t start: %d \t\t end:%d" % (tk[0],tk[1],tk[2]))
 ```
@@ -250,15 +251,15 @@ word 有限公司            start: 6                end:10
 ```
 
 
-7) : ChineseAnalyzer for Whoosh 搜索引擎
+7. ChineseAnalyzer for Whoosh 搜索引擎
 --------------------------------------------
 * 引用： `from jieba.analyse import ChineseAnalyzer`
 * 用法示例：https://github.com/fxsjy/jieba/blob/master/test/test_whoosh.py
 
-8) : 命令行分词
+8. 命令行分词
 -------------------
 
-使用示例：`cat news.txt | python -m jieba > cut_result.txt`
+使用示例：`python -m jieba news.txt > cut_result.txt`
 
 命令行选项（翻译）：
 
@@ -310,10 +311,10 @@ word 有限公司            start: 6                end:10
 
     If no filename specified, use STDIN instead.
 
-模块初始化机制的改变:lazy load （从0.28版本开始）
--------------------------------------------
+延迟加载机制
+------------
 
-jieba 采用延迟加载，"import jieba" 不会立即触发词典的加载，一旦有必要才开始加载词典构建前缀字典。如果你想手工初始 jieba，也可以手动初始化。
+jieba 采用延迟加载，`import jieba` 和 `jieba.Tokenizer()` 不会立即触发词典的加载，一旦有必要才开始加载词典构建前缀字典。如果你想手工初始 jieba，也可以手动初始化。
 
     import jieba
     jieba.initialize()  # 手动初始化（可选）
@@ -460,12 +461,15 @@ Algorithm
 Main Functions
 ==============
 
-1) : Cut
+1. Cut
 --------
 * The `jieba.cut` function accepts three input parameters: the first parameter is the string to be cut; the second parameter is `cut_all`, controlling the cut mode; the third parameter is to control whether to use the Hidden Markov Model.
 * `jieba.cut_for_search` accepts two parameter: the string to be cut; whether to use the Hidden Markov Model. This will cut the sentence into short words suitable for search engines.
 * The input string can be an unicode/str object, or a str/bytes object which is encoded in UTF-8 or GBK. Note that using GBK encoding is not recommended because it may be unexpectly decoded as UTF-8.
-* `jieba.cut` and `jieba.cut_for_search` returns an generator, from which you can use a `for` loop to get the segmentation result (in unicode), or `list(jieba.cut( ... ))` to create a list.
+* `jieba.cut` and `jieba.cut_for_search` returns an generator, from which you can use a `for` loop to get the segmentation result (in unicode).
+* `jieba.lcut` and `jieba.lcut_for_search` returns a list.
+* `jieba.Tokenizer(dictionary=DEFAULT_DICT)` creates a new customized Tokenizer, which enables you to use different dictionaries at the same time. `jieba.dt` is the default Tokenizer, to which almost all global functions are mapped.
+
 
 **Code example: segmentation**
 
@@ -497,7 +501,7 @@ Output:
     [Search Engine Mode]： 小明, 硕士, 毕业, 于, 中国, 科学, 学院, 科学院, 中国科学院, 计算, 计算所, 后, 在, 日本, 京都, 大学, 日本京都大学, 深造
 
 
-2) : Add a custom dictionary
+2. Add a custom dictionary
 ----------------------------
 
 ###　Load dictionary
@@ -505,6 +509,9 @@ Output:
 * Developers can specify their own custom dictionary to be included in the jieba default dictionary. Jieba is able to identify new words, but adding your own new words can ensure a higher accuracy.
 * Usage： `jieba.load_userdict(file_name) # file_name is the path of the custom dictionary`
 * The dictionary format is the same as that of `analyse/idf.txt`: one word per line; each line is divided into two parts, the first is the word itself, the other is the word frequency, separated by a space
+* The word frequency can be omitted, then a calculated value will be used.
+* Change a Tokenizer's `tmp_dir` and `cache_file` to specify the path of the cache file, for using on a restricted file system.
+
 * Example：
 
         云计算 5
@@ -540,12 +547,16 @@ Example:
 「/台中/」/正确/应该/不会/被/切开
 ```
 
-3) : Keyword Extraction
+3. Keyword Extraction
 -----------------------
-* `jieba.analyse.extract_tags(sentence,topK,withWeight) # needs to first import jieba.analyse`
-* `sentence`: the text to be extracted
-* `topK`: return how many keywords with the highest TF/IDF weights. The default value is 20
-* `withWeight`: whether return TF/IDF weights with the keywords. The default value is False
+`import jieba.analyse`
+
+* `jieba.analyse.extract_tags(sentence, topK=20, withWeight=False, allowPOS=())`
+  * `sentence`: the text to be extracted
+  * `topK`: return how many keywords with the highest TF/IDF weights. The default value is 20
+  * `withWeight`: whether return TF/IDF weights with the keywords. The default value is False
+  * `allowPOS`: filter words with which POSs are included. Empty for no filtering.
+* `jieba.analyse.TFIDF(idf_path=None)` creates a new TFIDF instance, `idf_path` specifies IDF file path.
 
 Example (keyword extraction)
 
@@ -565,10 +576,15 @@ Developers can specify their own custom stop words corpus in jieba keyword extra
 
 There's also a [TextRank](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) implementation available.
 
-Use: `jieba.analyse.textrank(raw_text)`.
+Use: `jieba.analyse.textrank(sentence, topK=20, withWeight=False, allowPOS=('ns', 'n', 'vn', 'v'))`
 
-4) : Part of Speech Tagging
------------
+Note that it filters POS by default.
+
+`jieba.analyse.TextRank()` creates a new TextRank instance.
+
+4. Part of Speech Tagging
+-------------------------
+* `jieba.posseg.POSTokenizer(tokenizer=None)` creates a new customized Tokenizer. `tokenizer` specifies the jieba.Tokenizer to internally use. `jieba.posseg.dt` is the default POSTokenizer.
 * Tags the POS of each word after segmentation, using labels compatible with ictclas.
 * Example:
 
@@ -584,8 +600,8 @@ Use: `jieba.analyse.textrank(raw_text)`.
 天安门 ns
 ```
 
-5) : Parallel Processing
------------
+5. Parallel Processing
+----------------------
 * Principle: Split target text by line, assign the lines into multiple Python processes, and then merge the results, which is considerably faster.
 * Based on the multiprocessing module of Python.
 * Usage:
@@ -597,8 +613,10 @@ Use: `jieba.analyse.textrank(raw_text)`.
 
 * Result: On a four-core 3.4GHz Linux machine, do accurate word segmentation on Complete Works of Jin Yong, and the speed reaches 1MB/s, which is 3.3 times faster than the single-process version.
 
-6) : Tokenize: return words with position
-----------------------------------
+* **Note** that parallel processing supports only default tokenizers, `jieba.dt` and `jieba.posseg.dt`.
+
+6. Tokenize: return words with position
+----------------------------------------
 * The input must be unicode
 * Default mode
 
@@ -634,13 +652,13 @@ word 有限公司            start: 6                end:10
 ```
 
 
-7) : ChineseAnalyzer for Whoosh
---------------------------------------------
+7. ChineseAnalyzer for Whoosh
+-------------------------------
 * `from jieba.analyse import ChineseAnalyzer`
 * Example: https://github.com/fxsjy/jieba/blob/master/test/test_whoosh.py
 
-8) : Command Line Interface
--------------------
+8. Command Line Interface
+--------------------------------
 
     $> python -m jieba --help
     usage: python -m jieba [options] filename
@@ -679,7 +697,8 @@ You can also specify the dictionary (not supported before version 0.28) :
 
 
 Using Other Dictionaries
-========
+===========================
+
 It is possible to use your own dictionary with Jieba, and there are also two dictionaries ready for download:
 
 1. A smaller dictionary for a smaller memory footprint:
diff --git a/jieba/__init__.py b/jieba/__init__.py
index d64d16c..a00ae52 100644
--- a/jieba/__init__.py
+++ b/jieba/__init__.py
@@ -6,503 +6,564 @@ import re
 import os
 import sys
 import time
-import tempfile
-import marshal
-from math import log
-import threading
-from functools import wraps
 import logging
+import marshal
+import tempfile
+import threading
+from math import log
 from hashlib import md5
 from ._compat import *
 from . import finalseg
 
-DICTIONARY = "dict.txt"
-DICT_LOCK = threading.RLock()
-FREQ = {}  # to be initialized
-total = 0
-user_word_tag_tab = {}
-initialized = False
-pool = None
-tmp_dir = None
+if os.name == 'nt':
+    from shutil import move as _replace_file
+else:
+    _replace_file = os.rename
 
-_curpath = os.path.normpath(
-    os.path.join(os.getcwd(), os.path.dirname(__file__)))
+_get_module_path = lambda path: os.path.normpath(os.path.join(os.getcwd(),
+                                                 os.path.dirname(__file__), path))
+_get_abs_path = lambda path: os.path.normpath(os.path.join(os.getcwd(), path))
+
+DEFAULT_DICT = _get_module_path("dict.txt")
 
 log_console = logging.StreamHandler(sys.stderr)
-logger = logging.getLogger(__name__)
-logger.setLevel(logging.DEBUG)
-logger.addHandler(log_console)
+default_logger = logging.getLogger(__name__)
+default_logger.setLevel(logging.DEBUG)
+default_logger.addHandler(log_console)
 
+DICT_WRITING = {}
 
-def setLogLevel(log_level):
-    global logger
-    logger.setLevel(log_level)
-
-
-def gen_pfdict(f_name):
-    lfreq = {}
-    ltotal = 0
-    with open(f_name, 'rb') as f:
-        lineno = 0
-        for line in f.read().rstrip().decode('utf-8').splitlines():
-            lineno += 1
-            try:
-                word, freq = line.split(' ')[:2]
-                freq = int(freq)
-                lfreq[word] = freq
-                ltotal += freq
-                for ch in xrange(len(word)):
-                    wfrag = word[:ch + 1]
-                    if wfrag not in lfreq:
-                        lfreq[wfrag] = 0
-            except ValueError as e:
-                logger.debug('%s at line %s %s' % (f_name, lineno, line))
-                raise e
-    return lfreq, ltotal
-
-
-def initialize(dictionary=None):
-    global FREQ, total, initialized, DICTIONARY, DICT_LOCK, tmp_dir
-    if not dictionary:
-        dictionary = DICTIONARY
-    with DICT_LOCK:
-        if initialized:
-            return
-
-        abs_path = os.path.join(_curpath, dictionary)
-        logger.debug("Building prefix dict from %s ..." % abs_path)
-        t1 = time.time()
-        # default dictionary
-        if abs_path == os.path.join(_curpath, "dict.txt"):
-            cache_file = os.path.join(tmp_dir if tmp_dir else tempfile.gettempdir(),"jieba.cache")
-        else:  # custom dictionary
-            cache_file = os.path.join(tmp_dir if tmp_dir else tempfile.gettempdir(),"jieba.u%s.cache" % md5(
-                abs_path.encode('utf-8', 'replace')).hexdigest())
-
-        load_from_cache_fail = True
-        if os.path.isfile(cache_file) and os.path.getmtime(cache_file) > os.path.getmtime(abs_path):
-            logger.debug("Loading model from cache %s" % cache_file)
-            try:
-                with open(cache_file, 'rb') as cf:
-                    FREQ, total = marshal.load(cf)
-                load_from_cache_fail = False
-            except Exception:
-                load_from_cache_fail = True
-
-        if load_from_cache_fail:
-            FREQ, total = gen_pfdict(abs_path)
-            logger.debug("Dumping model to file cache %s" % cache_file)
-            try:
-                fd, fpath = tempfile.mkstemp()
-                with os.fdopen(fd, 'wb') as temp_cache_file:
-                    marshal.dump((FREQ, total), temp_cache_file)
-                if os.name == 'nt':
-                    from shutil import move as replace_file
-                else:
-                    replace_file = os.rename
-                replace_file(fpath, cache_file)
-            except Exception:
-                logger.exception("Dump cache file failed.")
-
-        initialized = True
-
-        logger.debug("Loading model cost %s seconds." % (time.time() - t1))
-        logger.debug("Prefix dict has been built succesfully.")
-
-
-def require_initialized(fn):
-
-    @wraps(fn)
-    def wrapped(*args, **kwargs):
-        global initialized
-        if initialized:
-            return fn(*args, **kwargs)
-        else:
-            initialize(DICTIONARY)
-            return fn(*args, **kwargs)
-
-    return wrapped
-
-
-def __cut_all(sentence):
-    dag = get_DAG(sentence)
-    old_j = -1
-    for k, L in iteritems(dag):
-        if len(L) == 1 and k > old_j:
-            yield sentence[k:L[0] + 1]
-            old_j = L[0]
-        else:
-            for j in L:
-                if j > k:
-                    yield sentence[k:j + 1]
-                    old_j = j
-
-
-def calc(sentence, DAG, route):
-    N = len(sentence)
-    route[N] = (0, 0)
-    logtotal = log(total)
-    for idx in xrange(N - 1, -1, -1):
-        route[idx] = max((log(FREQ.get(sentence[idx:x + 1]) or 1) -
-                          logtotal + route[x + 1][0], x) for x in DAG[idx])
-
-
-@require_initialized
-def get_DAG(sentence):
-    global FREQ
-    DAG = {}
-    N = len(sentence)
-    for k in xrange(N):
-        tmplist = []
-        i = k
-        frag = sentence[k]
-        while i < N and frag in FREQ:
-            if FREQ[frag]:
-                tmplist.append(i)
-            i += 1
-            frag = sentence[k:i + 1]
-        if not tmplist:
-            tmplist.append(k)
-        DAG[k] = tmplist
-    return DAG
+pool = None
 
 re_eng = re.compile('[a-zA-Z0-9]', re.U)
 
-
-def __cut_DAG_NO_HMM(sentence):
-    DAG = get_DAG(sentence)
-    route = {}
-    calc(sentence, DAG, route)
-    x = 0
-    N = len(sentence)
-    buf = ''
-    while x < N:
-        y = route[x][1] + 1
-        l_word = sentence[x:y]
-        if re_eng.match(l_word) and len(l_word) == 1:
-            buf += l_word
-            x = y
-        else:
-            if buf:
-                yield buf
-                buf = ''
-            yield l_word
-            x = y
-    if buf:
-        yield buf
-        buf = ''
-
-
-def __cut_DAG(sentence):
-    DAG = get_DAG(sentence)
-    route = {}
-    calc(sentence, DAG, route=route)
-    x = 0
-    buf = ''
-    N = len(sentence)
-    while x < N:
-        y = route[x][1] + 1
-        l_word = sentence[x:y]
-        if y - x == 1:
-            buf += l_word
-        else:
-            if buf:
-                if len(buf) == 1:
-                    yield buf
-                    buf = ''
-                else:
-                    if not FREQ.get(buf):
-                        recognized = finalseg.cut(buf)
-                        for t in recognized:
-                            yield t
-                    else:
-                        for elem in buf:
-                            yield elem
-                    buf = ''
-            yield l_word
-        x = y
-
-    if buf:
-        if len(buf) == 1:
-            yield buf
-        elif not FREQ.get(buf):
-            recognized = finalseg.cut(buf)
-            for t in recognized:
-                yield t
-        else:
-            for elem in buf:
-                yield elem
-
+# \u4E00-\u9FA5a-zA-Z0-9+#&\._ : All non-space characters. Will be handled with re_han
+# \r\n|\s : whitespace characters. Will not be handled.
 re_han_default = re.compile("([\u4E00-\u9FA5a-zA-Z0-9+#&\._]+)", re.U)
 re_skip_default = re.compile("(\r\n|\s)", re.U)
 re_han_cut_all = re.compile("([\u4E00-\u9FA5]+)", re.U)
 re_skip_cut_all = re.compile("[^a-zA-Z0-9+#\n]", re.U)
 
+def setLogLevel(log_level):
+    global logger
+    default_logger.setLevel(log_level)
 
-def cut(sentence, cut_all=False, HMM=True):
-    '''
-    The main function that segments an entire sentence that contains
-    Chinese characters into seperated words.
+class Tokenizer(object):
 
-    Parameter:
-        - sentence: The str(unicode) to be segmented.
-        - cut_all: Model type. True for full pattern, False for accurate pattern.
-        - HMM: Whether to use the Hidden Markov Model.
-    '''
-    sentence = strdecode(sentence)
+    def __init__(self, dictionary=DEFAULT_DICT):
+        self.lock = threading.RLock()
+        self.dictionary = _get_abs_path(dictionary)
+        self.FREQ = {}
+        self.total = 0
+        self.user_word_tag_tab = {}
+        self.initialized = False
+        self.tmp_dir = None
+        self.cache_file = None
 
-    # \u4E00-\u9FA5a-zA-Z0-9+#&\._ : All non-space characters. Will be handled with re_han
-    # \r\n|\s : whitespace characters. Will not be handled.
+    def __repr__(self):
+        return '<Tokenizer dictionary=%r>' % self.dictionary
 
-    if cut_all:
-        re_han = re_han_cut_all
-        re_skip = re_skip_cut_all
-    else:
-        re_han = re_han_default
-        re_skip = re_skip_default
-    blocks = re_han.split(sentence)
-    if cut_all:
-        cut_block = __cut_all
-    elif HMM:
-        cut_block = __cut_DAG
-    else:
-        cut_block = __cut_DAG_NO_HMM
-    for blk in blocks:
-        if not blk:
-            continue
-        if re_han.match(blk):
-            for word in cut_block(blk):
-                yield word
+    def gen_pfdict(self, f_name):
+        lfreq = {}
+        ltotal = 0
+        with open(f_name, 'rb') as f:
+            for lineno, line in enumerate(f, 1):
+                try:
+                    line = line.strip().decode('utf-8')
+                    word, freq = line.split(' ')[:2]
+                    freq = int(freq)
+                    lfreq[word] = freq
+                    ltotal += freq
+                    for ch in xrange(len(word)):
+                        wfrag = word[:ch + 1]
+                        if wfrag not in lfreq:
+                            lfreq[wfrag] = 0
+                except ValueError:
+                    raise ValueError(
+                        'invalid dictionary entry in %s at Line %s: %s' % (f_name, lineno, line))
+        return lfreq, ltotal
+
+    def initialize(self, dictionary=None):
+        if dictionary:
+            abs_path = _get_abs_path(dictionary)
+            if self.dictionary == abs_path and self.initialized:
+                return
+            else:
+                self.dictionary = abs_path
+                self.initialized = False
         else:
-            tmp = re_skip.split(blk)
-            for x in tmp:
-                if re_skip.match(x):
-                    yield x
-                elif not cut_all:
-                    for xx in x:
-                        yield xx
-                else:
-                    yield x
+            abs_path = self.dictionary
 
+        with self.lock:
+            try:
+                with DICT_WRITING[abs_path]:
+                    pass
+            except KeyError:
+                pass
+            if self.initialized:
+                return
 
-def cut_for_search(sentence, HMM=True):
-    """
-    Finer segmentation for search engines.
-    """
-    words = cut(sentence, HMM=HMM)
-    for w in words:
-        if len(w) > 2:
-            for i in xrange(len(w) - 1):
-                gram2 = w[i:i + 2]
-                if FREQ.get(gram2):
-                    yield gram2
-        if len(w) > 3:
-            for i in xrange(len(w) - 2):
-                gram3 = w[i:i + 3]
-                if FREQ.get(gram3):
-                    yield gram3
-        yield w
+            default_logger.debug("Building prefix dict from %s ..." % abs_path)
+            t1 = time.time()
+            if self.cache_file:
+                cache_file = self.cache_file
+            # default dictionary
+            elif abs_path == DEFAULT_DICT:
+                cache_file = "jieba.cache"
+            else:  # custom dictionary
+                cache_file = "jieba.u%s.cache" % md5(
+                    abs_path.encode('utf-8', 'replace')).hexdigest()
+            cache_file = os.path.join(
+                self.tmp_dir or tempfile.gettempdir(), cache_file)
 
+            load_from_cache_fail = True
+            if os.path.isfile(cache_file) and os.path.getmtime(cache_file) > os.path.getmtime(abs_path):
+                default_logger.debug(
+                    "Loading model from cache %s" % cache_file)
+                try:
+                    with open(cache_file, 'rb') as cf:
+                        self.FREQ, self.total = marshal.load(cf)
+                    load_from_cache_fail = False
+                except Exception:
+                    load_from_cache_fail = True
 
-@require_initialized
-def load_userdict(f):
-    '''
-    Load personalized dict to improve detect rate.
+            if load_from_cache_fail:
+                wlock = DICT_WRITING.get(abs_path, threading.RLock())
+                DICT_WRITING[abs_path] = wlock
+                with wlock:
+                    self.FREQ, self.total = self.gen_pfdict(abs_path)
+                    default_logger.debug(
+                        "Dumping model to file cache %s" % cache_file)
+                    try:
+                        fd, fpath = tempfile.mkstemp()
+                        with os.fdopen(fd, 'wb') as temp_cache_file:
+                            marshal.dump(
+                                (self.FREQ, self.total), temp_cache_file)
+                        _replace_file(fpath, cache_file)
+                    except Exception:
+                        default_logger.exception("Dump cache file failed.")
 
-    Parameter:
-        - f : A plain text file contains words and their ocurrences.
+                try:
+                    del DICT_WRITING[abs_path]
+                except KeyError:
+                    pass
 
-    Structure of dict file:
-    word1 freq1 word_type1
-    word2 freq2 word_type2
-    ...
-    Word type may be ignored
-    '''
-    if isinstance(f, string_types):
-        f = open(f, 'rb')
-    content = f.read().decode('utf-8').lstrip('\ufeff')
-    line_no = 0
-    for line in content.splitlines():
-        try:
-            line_no += 1
-            line = line.strip()
-            if not line:
-                continue
-            tup = line.split(" ")
-            add_word(*tup)
-        except Exception as e:
-            logger.debug('%s at line %s %s' % (f.name, line_no, line))
-            raise e
+            self.initialized = True
+            default_logger.debug(
+                "Loading model cost %.3f seconds." % (time.time() - t1))
+            default_logger.debug("Prefix dict has been built succesfully.")
 
+    def check_initialized(self):
+        if not self.initialized:
+            self.initialize()
 
-@require_initialized
-def add_word(word, freq=None, tag=None):
-    """
-    Add a word to dictionary.
+    def calc(self, sentence, DAG, route):
+        N = len(sentence)
+        route[N] = (0, 0)
+        logtotal = log(self.total)
+        for idx in xrange(N - 1, -1, -1):
+            route[idx] = max((log(self.FREQ.get(sentence[idx:x + 1]) or 1) -
+                              logtotal + route[x + 1][0], x) for x in DAG[idx])
 
-    freq and tag can be omitted, freq defaults to be a calculated value
-    that ensures the word can be cut out.
-    """
-    global FREQ, total, user_word_tag_tab
-    word = strdecode(word)
-    if freq is None:
-        freq = suggest_freq(word, False)
-    else:
-        freq = int(freq)
-    FREQ[word] = freq
-    total += freq
-    if tag is not None:
-        user_word_tag_tab[word] = tag
-    for ch in xrange(len(word)):
-        wfrag = word[:ch + 1]
-        if wfrag not in FREQ:
-            FREQ[wfrag] = 0
+    def get_DAG(self, sentence):
+        self.check_initialized()
+        DAG = {}
+        N = len(sentence)
+        for k in xrange(N):
+            tmplist = []
+            i = k
+            frag = sentence[k]
+            while i < N and frag in self.FREQ:
+                if self.FREQ[frag]:
+                    tmplist.append(i)
+                i += 1
+                frag = sentence[k:i + 1]
+            if not tmplist:
+                tmplist.append(k)
+            DAG[k] = tmplist
+        return DAG
 
+    def __cut_all(self, sentence):
+        dag = self.get_DAG(sentence)
+        old_j = -1
+        for k, L in iteritems(dag):
+            if len(L) == 1 and k > old_j:
+                yield sentence[k:L[0] + 1]
+                old_j = L[0]
+            else:
+                for j in L:
+                    if j > k:
+                        yield sentence[k:j + 1]
+                        old_j = j
 
-def del_word(word):
-    """
-    Convenient function for deleting a word.
-    """
-    add_word(word, 0)
+    def __cut_DAG_NO_HMM(self, sentence):
+        DAG = self.get_DAG(sentence)
+        route = {}
+        self.calc(sentence, DAG, route)
+        x = 0
+        N = len(sentence)
+        buf = ''
+        while x < N:
+            y = route[x][1] + 1
+            l_word = sentence[x:y]
+            if re_eng.match(l_word) and len(l_word) == 1:
+                buf += l_word
+                x = y
+            else:
+                if buf:
+                    yield buf
+                    buf = ''
+                yield l_word
+                x = y
+        if buf:
+            yield buf
+            buf = ''
 
+    def __cut_DAG(self, sentence):
+        DAG = self.get_DAG(sentence)
+        route = {}
+        self.calc(sentence, DAG, route)
+        x = 0
+        buf = ''
+        N = len(sentence)
+        while x < N:
+            y = route[x][1] + 1
+            l_word = sentence[x:y]
+            if y - x == 1:
+                buf += l_word
+            else:
+                if buf:
+                    if len(buf) == 1:
+                        yield buf
+                        buf = ''
+                    else:
+                        if not self.FREQ.get(buf):
+                            recognized = finalseg.cut(buf)
+                            for t in recognized:
+                                yield t
+                        else:
+                            for elem in buf:
+                                yield elem
+                        buf = ''
+                yield l_word
+            x = y
 
-@require_initialized
-def suggest_freq(segment, tune=False):
-    """
-    Suggest word frequency to force the characters in a word to be
-    joined or splitted.
+        if buf:
+            if len(buf) == 1:
+                yield buf
+            elif not self.FREQ.get(buf):
+                recognized = finalseg.cut(buf)
+                for t in recognized:
+                    yield t
+            else:
+                for elem in buf:
+                    yield elem
 
-    Parameter:
-        - segment : The segments that the word is expected to be cut into,
-                    If the word should be treated as a whole, use a str.
-        - tune : If True, tune the word frequency.
+    def cut(self, sentence, cut_all=False, HMM=True):
+        '''
+        The main function that segments an entire sentence that contains
+        Chinese characters into seperated words.
 
-    Note that HMM may affect the final result. If the result doesn't change,
-    set HMM=False.
-    """
-    ftotal = float(total)
-    freq = 1
-    if isinstance(segment, string_types):
-        word = segment
-        for seg in cut(word, HMM=False):
-            freq *= FREQ.get(seg, 1) / ftotal
-        freq = max(int(freq*total) + 1, FREQ.get(word, 1))
-    else:
-        segment = tuple(map(strdecode, segment))
-        word = ''.join(segment)
-        for seg in segment:
-            freq *= FREQ.get(seg, 1) / ftotal
-        freq = min(int(freq*total), FREQ.get(word, 0))
-    if tune:
-        add_word(word, freq)
-    return freq
+        Parameter:
+            - sentence: The str(unicode) to be segmented.
+            - cut_all: Model type. True for full pattern, False for accurate pattern.
+            - HMM: Whether to use the Hidden Markov Model.
+        '''
+        sentence = strdecode(sentence)
 
-
-__ref_cut = cut
-__ref_cut_for_search = cut_for_search
-
-
-def __lcut(sentence):
-    return list(__ref_cut(sentence, False))
-
-
-def __lcut_no_hmm(sentence):
-    return list(__ref_cut(sentence, False, False))
-
-
-def __lcut_all(sentence):
-    return list(__ref_cut(sentence, True))
-
-
-def __lcut_for_search(sentence):
-    return list(__ref_cut_for_search(sentence))
-
-
-@require_initialized
-def enable_parallel(processnum=None):
-    global pool, cut, cut_for_search
-    if os.name == 'nt':
-        raise Exception("jieba: parallel mode only supports posix system")
-    from multiprocessing import Pool, cpu_count
-    if processnum is None:
-        processnum = cpu_count()
-    pool = Pool(processnum)
-
-    def pcut(sentence, cut_all=False, HMM=True):
-        parts = strdecode(sentence).splitlines(True)
         if cut_all:
-            result = pool.map(__lcut_all, parts)
-        elif HMM:
-            result = pool.map(__lcut, parts)
+            re_han = re_han_cut_all
+            re_skip = re_skip_cut_all
         else:
-            result = pool.map(__lcut_no_hmm, parts)
-        for r in result:
-            for w in r:
-                yield w
+            re_han = re_han_default
+            re_skip = re_skip_default
+        if cut_all:
+            cut_block = self.__cut_all
+        elif HMM:
+            cut_block = self.__cut_DAG
+        else:
+            cut_block = self.__cut_DAG_NO_HMM
+        blocks = re_han.split(sentence)
+        for blk in blocks:
+            if not blk:
+                continue
+            if re_han.match(blk):
+                for word in cut_block(blk):
+                    yield word
+            else:
+                tmp = re_skip.split(blk)
+                for x in tmp:
+                    if re_skip.match(x):
+                        yield x
+                    elif not cut_all:
+                        for xx in x:
+                            yield xx
+                    else:
+                        yield x
 
-    def pcut_for_search(sentence):
-        parts = strdecode(sentence).splitlines(True)
-        result = pool.map(__lcut_for_search, parts)
-        for r in result:
-            for w in r:
-                yield w
-
-    cut = pcut
-    cut_for_search = pcut_for_search
-
-
-def disable_parallel():
-    global pool, cut, cut_for_search
-    if pool:
-        pool.close()
-        pool = None
-    cut = __ref_cut
-    cut_for_search = __ref_cut_for_search
-
-
-def set_dictionary(dictionary_path):
-    global initialized, DICTIONARY
-    with DICT_LOCK:
-        abs_path = os.path.normpath(os.path.join(os.getcwd(), dictionary_path))
-        if not os.path.isfile(abs_path):
-            raise Exception("jieba: file does not exist: " + abs_path)
-        DICTIONARY = abs_path
-        initialized = False
-
-
-def get_abs_path_dict():
-    return os.path.join(_curpath, DICTIONARY)
-
-
-def tokenize(unicode_sentence, mode="default", HMM=True):
-    """
-    Tokenize a sentence and yields tuples of (word, start, end)
-
-    Parameter:
-        - sentence: the str(unicode) to be segmented.
-        - mode: "default" or "search", "search" is for finer segmentation.
-        - HMM: whether to use the Hidden Markov Model.
-    """
-    if not isinstance(unicode_sentence, text_type):
-        raise Exception("jieba: the input parameter should be unicode.")
-    start = 0
-    if mode == 'default':
-        for w in cut(unicode_sentence, HMM=HMM):
-            width = len(w)
-            yield (w, start, start + width)
-            start += width
-    else:
-        for w in cut(unicode_sentence, HMM=HMM):
-            width = len(w)
+    def cut_for_search(self, sentence, HMM=True):
+        """
+        Finer segmentation for search engines.
+        """
+        words = self.cut(sentence, HMM=HMM)
+        for w in words:
             if len(w) > 2:
                 for i in xrange(len(w) - 1):
                     gram2 = w[i:i + 2]
                     if FREQ.get(gram2):
-                        yield (gram2, start + i, start + i + 2)
+                        yield gram2
             if len(w) > 3:
                 for i in xrange(len(w) - 2):
                     gram3 = w[i:i + 3]
                     if FREQ.get(gram3):
-                        yield (gram3, start + i, start + i + 3)
-            yield (w, start, start + width)
-            start += width
+                        yield gram3
+            yield w
+
+    def lcut(self, *args, **kwargs):
+        return list(self.cut(*args, **kwargs))
+
+    def lcut_for_search(self, *args, **kwargs):
+        return list(self.cut_for_search(*args, **kwargs))
+
+    _lcut = lcut
+    _lcut_for_search = lcut_for_search
+
+    def _lcut_no_hmm(self, sentence):
+        return self.lcut(sentence, False, False)
+
+    def _lcut_all(self, sentence):
+        return self.lcut(sentence, True)
+
+    def _lcut_for_search_no_hmm(self, sentence):
+        return self.lcut_for_search(sentence, False)
+
+    def get_abs_path_dict(self):
+        return _get_abs_path(self.dictionary)
+
+    def load_userdict(self, f):
+        '''
+        Load personalized dict to improve detect rate.
+
+        Parameter:
+            - f : A plain text file contains words and their ocurrences.
+
+        Structure of dict file:
+        word1 freq1 word_type1
+        word2 freq2 word_type2
+        ...
+        Word type may be ignored
+        '''
+        self.check_initialized()
+        if isinstance(f, string_types):
+            f = open(f, 'rb')
+        for lineno, ln in enumerate(f, 1):
+            try:
+                line = ln.strip().decode('utf-8').lstrip('\ufeff')
+                if not line:
+                    continue
+                tup = line.split(" ")
+                self.add_word(*tup)
+            except Exception:
+                raise ValueError(
+                    'invalid dictionary entry in %s at Line %s: %s' % (
+                    f.name, lineno, line))
+
+    def add_word(self, word, freq=None, tag=None):
+        """
+        Add a word to dictionary.
+
+        freq and tag can be omitted, freq defaults to be a calculated value
+        that ensures the word can be cut out.
+        """
+        self.check_initialized()
+        word = strdecode(word)
+        if freq is None:
+            freq = self.suggest_freq(word, False)
+        else:
+            freq = int(freq)
+        self.FREQ[word] = freq
+        self.total += freq
+        if tag is not None:
+            self.user_word_tag_tab[word] = tag
+        for ch in xrange(len(word)):
+            wfrag = word[:ch + 1]
+            if wfrag not in self.FREQ:
+                self.FREQ[wfrag] = 0
+
+    def del_word(self, word):
+        """
+        Convenient function for deleting a word.
+        """
+        self.add_word(word, 0)
+
+    def suggest_freq(self, segment, tune=False):
+        """
+        Suggest word frequency to force the characters in a word to be
+        joined or splitted.
+
+        Parameter:
+            - segment : The segments that the word is expected to be cut into,
+                        If the word should be treated as a whole, use a str.
+            - tune : If True, tune the word frequency.
+
+        Note that HMM may affect the final result. If the result doesn't change,
+        set HMM=False.
+        """
+        self.check_initialized()
+        ftotal = float(self.total)
+        freq = 1
+        if isinstance(segment, string_types):
+            word = segment
+            for seg in self.cut(word, HMM=False):
+                freq *= self.FREQ.get(seg, 1) / ftotal
+            freq = max(int(freq * self.total) + 1, self.FREQ.get(word, 1))
+        else:
+            segment = tuple(map(strdecode, segment))
+            word = ''.join(segment)
+            for seg in segment:
+                freq *= self.FREQ.get(seg, 1) / ftotal
+            freq = min(int(freq * self.total), self.FREQ.get(word, 0))
+        if tune:
+            add_word(word, freq)
+        return freq
+
+    def tokenize(self, unicode_sentence, mode="default", HMM=True):
+        """
+        Tokenize a sentence and yields tuples of (word, start, end)
+
+        Parameter:
+            - sentence: the str(unicode) to be segmented.
+            - mode: "default" or "search", "search" is for finer segmentation.
+            - HMM: whether to use the Hidden Markov Model.
+        """
+        if not isinstance(unicode_sentence, text_type):
+            raise ValueError("jieba: the input parameter should be unicode.")
+        start = 0
+        if mode == 'default':
+            for w in self.cut(unicode_sentence, HMM=HMM):
+                width = len(w)
+                yield (w, start, start + width)
+                start += width
+        else:
+            for w in self.cut(unicode_sentence, HMM=HMM):
+                width = len(w)
+                if len(w) > 2:
+                    for i in xrange(len(w) - 1):
+                        gram2 = w[i:i + 2]
+                        if self.FREQ.get(gram2):
+                            yield (gram2, start + i, start + i + 2)
+                if len(w) > 3:
+                    for i in xrange(len(w) - 2):
+                        gram3 = w[i:i + 3]
+                        if self.FREQ.get(gram3):
+                            yield (gram3, start + i, start + i + 3)
+                yield (w, start, start + width)
+                start += width
+
+    def set_dictionary(self, dictionary_path):
+        with self.lock:
+            abs_path = _get_abs_path(dictionary_path)
+            if not os.path.isfile(abs_path):
+                raise Exception("jieba: file does not exist: " + abs_path)
+            self.dictionary = abs_path
+            self.initialized = False
+
+
+# default Tokenizer instance
+
+dt = Tokenizer()
+
+# global functions
+
+FREQ = dt.FREQ
+add_word = dt.add_word
+calc = dt.calc
+cut = dt.cut
+lcut = dt.lcut
+cut_for_search = dt.cut_for_search
+lcut_for_search = dt.lcut_for_search
+del_word = dt.del_word
+get_DAG = dt.get_DAG
+get_abs_path_dict = dt.get_abs_path_dict
+initialize = dt.initialize
+load_userdict = dt.load_userdict
+set_dictionary = dt.set_dictionary
+suggest_freq = dt.suggest_freq
+tokenize = dt.tokenize
+user_word_tag_tab = dt.user_word_tag_tab
+
+
+def _lcut_all(s):
+    return dt._lcut_all(s)
+
+
+def _lcut(s):
+    return dt._lcut(s)
+
+
+def _lcut_all(s):
+    return dt._lcut_all(s)
+
+
+def _lcut_for_search(s):
+    return dt._lcut_for_search(s)
+
+
+def _lcut_for_search_no_hmm(s):
+    return dt._lcut_for_search_no_hmm(s)
+
+
+def _pcut(sentence, cut_all=False, HMM=True):
+    parts = strdecode(sentence).splitlines(True)
+    if cut_all:
+        result = pool.map(_lcut_all, parts)
+    elif HMM:
+        result = pool.map(_lcut, parts)
+    else:
+        result = pool.map(_lcut_no_hmm, parts)
+    for r in result:
+        for w in r:
+            yield w
+
+
+def _pcut_for_search(sentence, HMM=True):
+    parts = strdecode(sentence).splitlines(True)
+    if HMM:
+        result = pool.map(_lcut_for_search, parts)
+    else:
+        result = pool.map(_lcut_for_search_no_hmm, parts)
+    for r in result:
+        for w in r:
+            yield w
+
+
+def enable_parallel(processnum=None):
+    """
+    Change the module's `cut` and `cut_for_search` functions to the
+    parallel version.
+
+    Note that this only works using dt, custom Tokenizer
+    instances are not supported.
+    """
+    global pool, dt, cut, cut_for_search
+    from multiprocessing import cpu_count
+    if os.name == 'nt':
+        raise NotImplementedError(
+            "jieba: parallel mode only supports posix system")
+    else:
+        from multiprocessing import Pool
+    dt.check_initialized()
+    if processnum is None:
+        processnum = cpu_count()
+    pool = Pool(processnum)
+    cut = _pcut
+    cut_for_search = _pcut_for_search
+
+
+def disable_parallel():
+    global pool, dt, cut, cut_for_search
+    if pool:
+        pool.close()
+        pool = None
+    cut = dt.cut
+    cut_for_search = dt.cut_for_search
diff --git a/jieba/analyse/__init__.py b/jieba/analyse/__init__.py
index da2514c..f956ef5 100755
--- a/jieba/analyse/__init__.py
+++ b/jieba/analyse/__init__.py
@@ -1,103 +1,18 @@
-#encoding=utf-8
 from __future__ import absolute_import
-import jieba
-import jieba.posseg
-import os
-from operator import itemgetter
-from .textrank import textrank
+from .tfidf import TFIDF
+from .textrank import TextRank
 try:
     from .analyzer import ChineseAnalyzer
 except ImportError:
     pass
 
-_curpath = os.path.normpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))
-abs_path = os.path.join(_curpath, "idf.txt")
+default_tfidf = TFIDF()
+default_textrank = TextRank()
 
-STOP_WORDS = set((
-    "the","of","is","and","to","in","that","we","for","an","are",
-    "by","be","as","on","with","can","if","from","which","you","it",
-    "this","then","at","have","all","not","one","has","or","that"
-))
-
-class IDFLoader:
-    def __init__(self):
-        self.path = ""
-        self.idf_freq = {}
-        self.median_idf = 0.0
-
-    def set_new_path(self, new_idf_path):
-        if self.path != new_idf_path:
-            content = open(new_idf_path, 'rb').read().decode('utf-8')
-            idf_freq = {}
-            lines = content.rstrip('\n').split('\n')
-            for line in lines:
-                word, freq = line.split(' ')
-                idf_freq[word] = float(freq)
-            median_idf = sorted(idf_freq.values())[len(idf_freq)//2]
-            self.idf_freq = idf_freq
-            self.median_idf = median_idf
-            self.path = new_idf_path
-
-    def get_idf(self):
-        return self.idf_freq, self.median_idf
-
-idf_loader = IDFLoader()
-idf_loader.set_new_path(abs_path)
-
-def set_idf_path(idf_path):
-    new_abs_path = os.path.normpath(os.path.join(os.getcwd(), idf_path))
-    if not os.path.exists(new_abs_path):
-        raise Exception("jieba: path does not exist: " + new_abs_path)
-    idf_loader.set_new_path(new_abs_path)
+extract_tags = tfidf = default_tfidf.extract_tags
+set_idf_path = default_tfidf.set_idf_path
+textrank = default_textrank.extract_tags
 
 def set_stop_words(stop_words_path):
-    global STOP_WORDS
-    abs_path = os.path.normpath(os.path.join(os.getcwd(), stop_words_path))
-    if not os.path.exists(abs_path):
-        raise Exception("jieba: path does not exist: " + abs_path)
-    content = open(abs_path,'rb').read().decode('utf-8')
-    lines = content.replace("\r", "").split('\n')
-    for line in lines:
-        STOP_WORDS.add(line)
-
-def extract_tags(sentence, topK=20, withWeight=False, allowPOS=[]):
-    """
-    Extract keywords from sentence using TF-IDF algorithm.
-    Parameter:
-        - topK: return how many top keywords. `None` for all possible words.
-        - withWeight: if True, return a list of (word, weight);
-                      if False, return a list of words.
-        - allowPOS: the allowed POS list eg. ['ns', 'n', 'vn', 'v','nr'].
-                    if the POS of w is not in this list,it will be filtered.
-    """
-    global STOP_WORDS, idf_loader
-
-    idf_freq, median_idf = idf_loader.get_idf()
-
-    if allowPOS:
-        allowPOS = frozenset(allowPOS)
-        words = jieba.posseg.cut(sentence)
-    else:
-        words = jieba.cut(sentence)
-    freq = {}
-    for w in words:
-        if allowPOS:
-            if w.flag not in allowPOS:
-                continue
-            else:
-                w = w.word
-        if len(w.strip()) < 2 or w.lower() in STOP_WORDS:
-            continue
-        freq[w] = freq.get(w, 0.0) + 1.0
-    total = sum(freq.values())
-    for k in freq:
-        freq[k] *= idf_freq.get(k, median_idf) / total
-
-    if withWeight:
-        tags = sorted(freq.items(), key=itemgetter(1), reverse=True)
-    else:
-        tags = sorted(freq, key=freq.__getitem__, reverse=True)
-    if topK:
-        return tags[:topK]
-    else:
-        return tags
+    default_tfidf.set_stop_words(stop_words_path)
+    default_textrank.set_stop_words(stop_words_path)
diff --git a/jieba/analyse/analyzer.py b/jieba/analyse/analyzer.py
index 46de250..7f5d8f1 100644
--- a/jieba/analyse/analyzer.py
+++ b/jieba/analyse/analyzer.py
@@ -1,7 +1,7 @@
-#encoding=utf-8
+# encoding=utf-8
 from __future__ import unicode_literals
-from whoosh.analysis import RegexAnalyzer,LowercaseFilter,StopFilter,StemFilter
-from whoosh.analysis import Tokenizer,Token
+from whoosh.analysis import RegexAnalyzer, LowercaseFilter, StopFilter, StemFilter
+from whoosh.analysis import Tokenizer, Token
 from whoosh.lang.porter import stem
 
 import jieba
@@ -15,12 +15,14 @@ STOP_WORDS = frozenset(('a', 'an', 'and', 'are', 'as', 'at', 'be', 'by', 'can',
 
 accepted_chars = re.compile(r"[\u4E00-\u9FA5]+")
 
+
 class ChineseTokenizer(Tokenizer):
+
     def __call__(self, text, **kargs):
         words = jieba.tokenize(text, mode="search")
         token = Token()
-        for (w,start_pos,stop_pos) in words:
-            if not accepted_chars.match(w) and len(w)<=1:
+        for (w, start_pos, stop_pos) in words:
+            if not accepted_chars.match(w) and len(w) <= 1:
                 continue
             token.original = token.text = w
             token.pos = start_pos
@@ -28,7 +30,8 @@ class ChineseTokenizer(Tokenizer):
             token.endchar = stop_pos
             yield token
 
+
 def ChineseAnalyzer(stoplist=STOP_WORDS, minsize=1, stemfn=stem, cachesize=50000):
     return (ChineseTokenizer() | LowercaseFilter() |
-            StopFilter(stoplist=stoplist,minsize=minsize) |
-            StemFilter(stemfn=stemfn, ignore=None,cachesize=cachesize))
+            StopFilter(stoplist=stoplist, minsize=minsize) |
+            StemFilter(stemfn=stemfn, ignore=None, cachesize=cachesize))
diff --git a/jieba/analyse/textrank.py b/jieba/analyse/textrank.py
index 94d7f1b..019a1cb 100644
--- a/jieba/analyse/textrank.py
+++ b/jieba/analyse/textrank.py
@@ -3,9 +3,10 @@
 
 from __future__ import absolute_import, unicode_literals
 import sys
-import collections
 from operator import itemgetter
-import jieba.posseg as pseg
+from collections import defaultdict
+import jieba.posseg
+from .tfidf import KeywordExtractor
 from .._compat import *
 
 
@@ -13,7 +14,7 @@ class UndirectWeightedGraph:
     d = 0.85
 
     def __init__(self):
-        self.graph = collections.defaultdict(list)
+        self.graph = defaultdict(list)
 
     def addEdge(self, start, end, weight):
         # use a tuple (start, end, weight) instead of a Edge object
@@ -21,8 +22,8 @@ class UndirectWeightedGraph:
         self.graph[end].append((end, start, weight))
 
     def rank(self):
-        ws = collections.defaultdict(float)
-        outSum = collections.defaultdict(float)
+        ws = defaultdict(float)
+        outSum = defaultdict(float)
 
         wsdef = 1.0 / (len(self.graph) or 1.0)
         for n, out in self.graph.items():
@@ -53,43 +54,51 @@ class UndirectWeightedGraph:
         return ws
 
 
-def textrank(sentence, topK=10, withWeight=False, allowPOS=['ns', 'n', 'vn', 'v']):
-    """
-    Extract keywords from sentence using TextRank algorithm.
-    Parameter:
-        - topK: return how many top keywords. `None` for all possible words.
-        - withWeight: if True, return a list of (word, weight);
-                      if False, return a list of words.
-        - allowPOS: the allowed POS list eg. ['ns', 'n', 'vn', 'v'].
-                    if the POS of w is not in this list,it will be filtered.
-    """
-    pos_filt = frozenset(allowPOS)
-    g = UndirectWeightedGraph()
-    cm = collections.defaultdict(int)
-    span = 5
-    words = list(pseg.cut(sentence))
-    for i in xrange(len(words)):
-        if words[i].flag in pos_filt:
-            for j in xrange(i + 1, i + span):
-                if j >= len(words):
-                    break
-                if words[j].flag not in pos_filt:
-                    continue
-                cm[(words[i].word, words[j].word)] += 1
+class TextRank(KeywordExtractor):
 
-    for terms, w in cm.items():
-        g.addEdge(terms[0], terms[1], w)
-    nodes_rank = g.rank()
-    if withWeight:
-        tags = sorted(nodes_rank.items(), key=itemgetter(1), reverse=True)
-    else:
-        tags = sorted(nodes_rank, key=nodes_rank.__getitem__, reverse=True)
-    if topK:
-        return tags[:topK]
-    else:
-        return tags
+    def __init__(self):
+        self.tokenizer = self.postokenizer = jieba.posseg.dt
+        self.stop_words = self.STOP_WORDS.copy()
+        self.pos_filt = frozenset(('ns', 'n', 'vn', 'v'))
+        self.span = 5
 
-if __name__ == '__main__':
-    s = "此外，公司拟对全资子公司吉林欧亚置业有限公司增资4.3亿元，增资后，吉林欧亚置业注册资本由7000万元增加到5亿元。吉林欧亚置业主要经营范围为房地产开发及百货零售等业务。目前在建吉林欧亚城市商业综合体项目。2013年，实现营业收入0万元，实现净利润-139.13万元。"
-    for x, w in textrank(s, withWeight=True):
-        print('%s %s' % (x, w))
+    def pairfilter(self, wp):
+        return (wp.flag in self.pos_filt and len(wp.word.strip()) >= 2
+                and wp.word.lower() not in self.stop_words)
+
+    def textrank(self, sentence, topK=20, withWeight=False, allowPOS=('ns', 'n', 'vn', 'v')):
+        """
+        Extract keywords from sentence using TextRank algorithm.
+        Parameter:
+            - topK: return how many top keywords. `None` for all possible words.
+            - withWeight: if True, return a list of (word, weight);
+                          if False, return a list of words.
+            - allowPOS: the allowed POS list eg. ['ns', 'n', 'vn', 'v'].
+                        if the POS of w is not in this list, it will be filtered.
+        """
+        self.pos_filt = frozenset(allowPOS)
+        g = UndirectWeightedGraph()
+        cm = defaultdict(int)
+        words = tuple(self.tokenizer.cut(sentence))
+        for i, wp in enumerate(words):
+            if self.pairfilter(wp):
+                for j in xrange(i + 1, i + self.span):
+                    if j >= len(words):
+                        break
+                    if not self.pairfilter(words[j]):
+                        continue
+                    cm[(wp.word, words[j].word)] += 1
+
+        for terms, w in cm.items():
+            g.addEdge(terms[0], terms[1], w)
+        nodes_rank = g.rank()
+        if withWeight:
+            tags = sorted(nodes_rank.items(), key=itemgetter(1), reverse=True)
+        else:
+            tags = sorted(nodes_rank, key=nodes_rank.__getitem__, reverse=True)
+        if topK:
+            return tags[:topK]
+        else:
+            return tags
+
+    extract_tags = textrank
diff --git a/jieba/analyse/tfidf.py b/jieba/analyse/tfidf.py
new file mode 100755
index 0000000..14abfb0
--- /dev/null
+++ b/jieba/analyse/tfidf.py
@@ -0,0 +1,111 @@
+# encoding=utf-8
+from __future__ import absolute_import
+import os
+import jieba
+import jieba.posseg
+from operator import itemgetter
+
+_get_module_path = lambda path: os.path.normpath(os.path.join(os.getcwd(),
+                                                 os.path.dirname(__file__), path))
+_get_abs_path = jieba._get_abs_path
+
+DEFAULT_IDF = _get_module_path("idf.txt")
+
+
+class KeywordExtractor(object):
+
+    STOP_WORDS = set((
+        "the", "of", "is", "and", "to", "in", "that", "we", "for", "an", "are",
+        "by", "be", "as", "on", "with", "can", "if", "from", "which", "you", "it",
+        "this", "then", "at", "have", "all", "not", "one", "has", "or", "that"
+    ))
+
+    def set_stop_words(self, stop_words_path):
+        abs_path = _get_abs_path(stop_words_path)
+        if not os.path.isfile(abs_path):
+            raise Exception("jieba: file does not exist: " + abs_path)
+        content = open(abs_path, 'rb').read().decode('utf-8')
+        for line in content.splitlines():
+            self.stop_words.add(line)
+
+    def extract_tags(self, *args, **kwargs):
+        raise NotImplementedError
+
+
+class IDFLoader(object):
+
+    def __init__(self, idf_path=None):
+        self.path = ""
+        self.idf_freq = {}
+        self.median_idf = 0.0
+        if idf_path:
+            self.set_new_path(idf_path)
+
+    def set_new_path(self, new_idf_path):
+        if self.path != new_idf_path:
+            self.path = new_idf_path
+            content = open(new_idf_path, 'rb').read().decode('utf-8')
+            self.idf_freq = {}
+            for line in content.splitlines():
+                word, freq = line.strip().split(' ')
+                self.idf_freq[word] = float(freq)
+            self.median_idf = sorted(
+                self.idf_freq.values())[len(self.idf_freq) // 2]
+
+    def get_idf(self):
+        return self.idf_freq, self.median_idf
+
+
+class TFIDF(KeywordExtractor):
+
+    def __init__(self, idf_path=None):
+        self.tokenizer = jieba.dt
+        self.postokenizer = jieba.posseg.dt
+        self.stop_words = self.STOP_WORDS.copy()
+        self.idf_loader = IDFLoader(idf_path or DEFAULT_IDF)
+        self.idf_freq, self.median_idf = self.idf_loader.get_idf()
+
+    def set_idf_path(self, idf_path):
+        new_abs_path = _get_abs_path(idf_path)
+        if not os.path.isfile(new_abs_path):
+            raise Exception("jieba: file does not exist: " + new_abs_path)
+        self.idf_loader.set_new_path(new_abs_path)
+        self.idf_freq, self.median_idf = self.idf_loader.get_idf()
+
+    def extract_tags(self, sentence, topK=20, withWeight=False, allowPOS=()):
+        """
+        Extract keywords from sentence using TF-IDF algorithm.
+        Parameter:
+            - topK: return how many top keywords. `None` for all possible words.
+            - withWeight: if True, return a list of (word, weight);
+                          if False, return a list of words.
+            - allowPOS: the allowed POS list eg. ['ns', 'n', 'vn', 'v','nr'].
+                        if the POS of w is not in this list,it will be filtered.
+        """
+        if allowPOS:
+            allowPOS = frozenset(allowPOS)
+            words = self.postokenizer.cut(sentence)
+        else:
+            words = self.tokenizer.cut(sentence)
+        freq = {}
+        for w in words:
+            if allowPOS:
+                if w.flag not in allowPOS:
+                    continue
+                else:
+                    w = w.word
+            if len(w.strip()) < 2 or w.lower() in self.stop_words:
+                continue
+            freq[w] = freq.get(w, 0.0) + 1.0
+        total = sum(freq.values())
+        for k in freq:
+            freq[k] *= self.idf_freq.get(k, self.median_idf) / total
+
+        if withWeight:
+            tags = sorted(freq.items(), key=itemgetter(1), reverse=True)
+        else:
+            tags = sorted(freq, key=freq.__getitem__, reverse=True)
+        if topK:
+            return tags[:topK]
+        else:
+            return tags
diff --git a/jieba/posseg/__init__.py b/jieba/posseg/__init__.py
index 680050c..3133233 100644
--- a/jieba/posseg/__init__.py
+++ b/jieba/posseg/__init__.py
@@ -1,10 +1,9 @@
 from __future__ import absolute_import, unicode_literals
-import re
 import os
-import jieba
+import re
 import sys
+import jieba
 import marshal
-from functools import wraps
 from .._compat import *
 from .viterbi import viterbi
 
@@ -24,23 +23,10 @@ re_num = re.compile("[\.0-9]+")
 re_eng1 = re.compile('^[a-zA-Z0-9]$', re.U)
 
 
-def load_model(f_name, isJython=True):
+def load_model(f_name):
     _curpath = os.path.normpath(
         os.path.join(os.getcwd(), os.path.dirname(__file__)))
-
-    result = {}
-    with open(f_name, "rb") as f:
-        for line in f:
-            line = line.strip()
-            if not line:
-                continue
-            line = line.decode("utf-8")
-            word, _, tag = line.split(" ")
-            result[word] = tag
-
-    if not isJython:
-        return result
-
+    # For Jython
     start_p = {}
     abs_path = os.path.join(_curpath, PROB_START_P)
     with open(abs_path, 'rb') as f:
@@ -64,29 +50,15 @@ def load_model(f_name, isJython=True):
 
     return state, start_p, trans_p, emit_p, result
 
+
 if sys.platform.startswith("java"):
-    char_state_tab_P, start_P, trans_P, emit_P, word_tag_tab = load_model(
-        jieba.get_abs_path_dict())
+    char_state_tab_P, start_P, trans_P, emit_P, word_tag_tab = load_model()
 else:
     from .char_state_tab import P as char_state_tab_P
     from .prob_start import P as start_P
     from .prob_trans import P as trans_P
     from .prob_emit import P as emit_P
 
-    word_tag_tab = load_model(jieba.get_abs_path_dict(), isJython=False)
-
-
-def makesure_userdict_loaded(fn):
-
-    @wraps(fn)
-    def wrapped(*args, **kwargs):
-        if jieba.user_word_tag_tab:
-            word_tag_tab.update(jieba.user_word_tag_tab)
-            jieba.user_word_tag_tab = {}
-        return fn(*args, **kwargs)
-
-    return wrapped
-
 
 class pair(object):
 
@@ -110,154 +82,220 @@ class pair(object):
         return self.__unicode__().encode(arg)
 
 
-def __cut(sentence):
-    prob, pos_list = viterbi(
-        sentence, char_state_tab_P, start_P, trans_P, emit_P)
-    begin, nexti = 0, 0
+class POSTokenizer(object):
 
-    for i, char in enumerate(sentence):
-        pos = pos_list[i][0]
-        if pos == 'B':
-            begin = i
-        elif pos == 'E':
-            yield pair(sentence[begin:i + 1], pos_list[i][1])
-            nexti = i + 1
-        elif pos == 'S':
-            yield pair(char, pos_list[i][1])
-            nexti = i + 1
-    if nexti < len(sentence):
-        yield pair(sentence[nexti:], pos_list[nexti][1])
+    def __init__(self, tokenizer=None):
+        self.tokenizer = tokenizer or jieba.Tokenizer()
+        self.load_word_tag(self.tokenizer.get_abs_path_dict())
 
+    def __repr__(self):
+        return '<POSTokenizer tokenizer=%r>' % self.tokenizer
 
-def __cut_detail(sentence):
-    blocks = re_han_detail.split(sentence)
-    for blk in blocks:
-        if re_han_detail.match(blk):
-            for word in __cut(blk):
-                yield word
-        else:
-            tmp = re_skip_detail.split(blk)
-            for x in tmp:
-                if x:
-                    if re_num.match(x):
-                        yield pair(x, 'm')
-                    elif re_eng.match(x):
-                        yield pair(x, 'eng')
-                    else:
-                        yield pair(x, 'x')
+    def __getattr__(self, name):
+        if name in ('cut_for_search', 'lcut_for_search', 'tokenize'):
+            # may be possible?
+            raise NotImplementedError
+        return getattr(self.tokenizer, name)
 
+    def initialize(self, dictionary=None):
+        self.tokenizer.initialize(dictionary)
+        self.load_word_tag(self.tokenizer.get_abs_path_dict())
 
-def __cut_DAG_NO_HMM(sentence):
-    DAG = jieba.get_DAG(sentence)
-    route = {}
-    jieba.calc(sentence, DAG, route)
-    x = 0
-    N = len(sentence)
-    buf = ''
-    while x < N:
-        y = route[x][1] + 1
-        l_word = sentence[x:y]
-        if re_eng1.match(l_word):
-            buf += l_word
-            x = y
-        else:
-            if buf:
-                yield pair(buf, 'eng')
-                buf = ''
-            yield pair(l_word, word_tag_tab.get(l_word, 'x'))
-            x = y
-    if buf:
-        yield pair(buf, 'eng')
-        buf = ''
+    def load_word_tag(self, f_name):
+        self.word_tag_tab = {}
+        with open(f_name, "rb") as f:
+            for lineno, line in enumerate(f, 1):
+                try:
+                    line = line.strip().decode("utf-8")
+                    if not line:
+                        continue
+                    word, _, tag = line.split(" ")
+                    self.word_tag_tab[word] = tag
+                except Exception:
+                    raise ValueError(
+                        'invalid POS dictionary entry in %s at Line %s: %s' % (f_name, lineno, line))
 
+    def makesure_userdict_loaded(self):
+        if self.tokenizer.user_word_tag_tab:
+            self.word_tag_tab.update(self.tokenizer.user_word_tag_tab)
+            self.tokenizer.user_word_tag_tab = {}
 
-def __cut_DAG(sentence):
-    DAG = jieba.get_DAG(sentence)
-    route = {}
+    def __cut(self, sentence):
+        prob, pos_list = viterbi(
+            sentence, char_state_tab_P, start_P, trans_P, emit_P)
+        begin, nexti = 0, 0
 
-    jieba.calc(sentence, DAG, route)
+        for i, char in enumerate(sentence):
+            pos = pos_list[i][0]
+            if pos == 'B':
+                begin = i
+            elif pos == 'E':
+                yield pair(sentence[begin:i + 1], pos_list[i][1])
+                nexti = i + 1
+            elif pos == 'S':
+                yield pair(char, pos_list[i][1])
+                nexti = i + 1
+        if nexti < len(sentence):
+            yield pair(sentence[nexti:], pos_list[nexti][1])
 
-    x = 0
-    buf = ''
-    N = len(sentence)
-    while x < N:
-        y = route[x][1] + 1
-        l_word = sentence[x:y]
-        if y - x == 1:
-            buf += l_word
-        else:
-            if buf:
-                if len(buf) == 1:
-                    yield pair(buf, word_tag_tab.get(buf, 'x'))
-                elif not jieba.FREQ.get(buf):
-                    recognized = __cut_detail(buf)
-                    for t in recognized:
-                        yield t
-                else:
-                    for elem in buf:
-                        yield pair(elem, word_tag_tab.get(elem, 'x'))
-                buf = ''
-            yield pair(l_word, word_tag_tab.get(l_word, 'x'))
-        x = y
-
-    if buf:
-        if len(buf) == 1:
-            yield pair(buf, word_tag_tab.get(buf, 'x'))
-        elif not jieba.FREQ.get(buf):
-            recognized = __cut_detail(buf)
-            for t in recognized:
-                yield t
-        else:
-            for elem in buf:
-                yield pair(elem, word_tag_tab.get(elem, 'x'))
-
-
-def __cut_internal(sentence, HMM=True):
-    sentence = strdecode(sentence)
-    blocks = re_han_internal.split(sentence)
-    if HMM:
-        __cut_blk = __cut_DAG
-    else:
-        __cut_blk = __cut_DAG_NO_HMM
-
-    for blk in blocks:
-        if re_han_internal.match(blk):
-            for word in __cut_blk(blk):
-                yield word
-        else:
-            tmp = re_skip_internal.split(blk)
-            for x in tmp:
-                if re_skip_internal.match(x):
-                    yield pair(x, 'x')
-                else:
-                    for xx in x:
-                        if re_num.match(xx):
-                            yield pair(xx, 'm')
+    def __cut_detail(self, sentence):
+        blocks = re_han_detail.split(sentence)
+        for blk in blocks:
+            if re_han_detail.match(blk):
+                for word in self.__cut(blk):
+                    yield word
+            else:
+                tmp = re_skip_detail.split(blk)
+                for x in tmp:
+                    if x:
+                        if re_num.match(x):
+                            yield pair(x, 'm')
                         elif re_eng.match(x):
-                            yield pair(xx, 'eng')
+                            yield pair(x, 'eng')
                         else:
-                            yield pair(xx, 'x')
+                            yield pair(x, 'x')
+
+    def __cut_DAG_NO_HMM(self, sentence):
+        DAG = self.tokenizer.get_DAG(sentence)
+        route = {}
+        self.tokenizer.calc(sentence, DAG, route)
+        x = 0
+        N = len(sentence)
+        buf = ''
+        while x < N:
+            y = route[x][1] + 1
+            l_word = sentence[x:y]
+            if re_eng1.match(l_word):
+                buf += l_word
+                x = y
+            else:
+                if buf:
+                    yield pair(buf, 'eng')
+                    buf = ''
+                yield pair(l_word, self.word_tag_tab.get(l_word, 'x'))
+                x = y
+        if buf:
+            yield pair(buf, 'eng')
+            buf = ''
+
+    def __cut_DAG(self, sentence):
+        DAG = self.tokenizer.get_DAG(sentence)
+        route = {}
+
+        self.tokenizer.calc(sentence, DAG, route)
+
+        x = 0
+        buf = ''
+        N = len(sentence)
+        while x < N:
+            y = route[x][1] + 1
+            l_word = sentence[x:y]
+            if y - x == 1:
+                buf += l_word
+            else:
+                if buf:
+                    if len(buf) == 1:
+                        yield pair(buf, self.word_tag_tab.get(buf, 'x'))
+                    elif not self.tokenizer.FREQ.get(buf):
+                        recognized = self.__cut_detail(buf)
+                        for t in recognized:
+                            yield t
+                    else:
+                        for elem in buf:
+                            yield pair(elem, self.word_tag_tab.get(elem, 'x'))
+                    buf = ''
+                yield pair(l_word, self.word_tag_tab.get(l_word, 'x'))
+            x = y
+
+        if buf:
+            if len(buf) == 1:
+                yield pair(buf, self.word_tag_tab.get(buf, 'x'))
+            elif not self.tokenizer.FREQ.get(buf):
+                recognized = self.__cut_detail(buf)
+                for t in recognized:
+                    yield t
+            else:
+                for elem in buf:
+                    yield pair(elem, self.word_tag_tab.get(elem, 'x'))
+
+    def __cut_internal(self, sentence, HMM=True):
+        self.makesure_userdict_loaded()
+        sentence = strdecode(sentence)
+        blocks = re_han_internal.split(sentence)
+        if HMM:
+            cut_blk = self.__cut_DAG
+        else:
+            cut_blk = self.__cut_DAG_NO_HMM
+
+        for blk in blocks:
+            if re_han_internal.match(blk):
+                for word in cut_blk(blk):
+                    yield word
+            else:
+                tmp = re_skip_internal.split(blk)
+                for x in tmp:
+                    if re_skip_internal.match(x):
+                        yield pair(x, 'x')
+                    else:
+                        for xx in x:
+                            if re_num.match(xx):
+                                yield pair(xx, 'm')
+                            elif re_eng.match(x):
+                                yield pair(xx, 'eng')
+                            else:
+                                yield pair(xx, 'x')
+
+    def _lcut_internal(self, sentence):
+        return list(self.__cut_internal(sentence))
+
+    def _lcut_internal_no_hmm(self, sentence):
+        return list(self.__cut_internal(sentence, False))
+
+    def cut(self, sentence, HMM=True):
+        for w in self.__cut_internal(sentence, HMM=HMM):
+            yield w
+
+    def lcut(self, *args, **kwargs):
+        return list(self.cut(*args, **kwargs))
+
+# default Tokenizer instance
+
+dt = POSTokenizer(jieba.dt)
+
+# global functions
+
+initialize = dt.initialize
 
 
-def __lcut_internal(sentence):
-    return list(__cut_internal(sentence))
+def _lcut_internal(s):
+    return dt._lcut_internal(s)
 
 
-def __lcut_internal_no_hmm(sentence):
-    return list(__cut_internal(sentence, False))
+def _lcut_internal_no_hmm(s):
+    return dt._lcut_internal_no_hmm(s)
 
 
-@makesure_userdict_loaded
 def cut(sentence, HMM=True):
+    """
+    Global `cut` function that supports parallel processing.
+
+    Note that this only works using dt, custom POSTokenizer
+    instances are not supported.
+    """
+    global dt
     if jieba.pool is None:
-        for w in __cut_internal(sentence, HMM=HMM):
+        for w in dt.cut(sentence, HMM=HMM):
             yield w
     else:
         parts = strdecode(sentence).splitlines(True)
         if HMM:
-            result = jieba.pool.map(__lcut_internal, parts)
+            result = jieba.pool.map(_lcut_internal, parts)
         else:
-            result = jieba.pool.map(__lcut_internal_no_hmm, parts)
+            result = jieba.pool.map(_lcut_internal_no_hmm, parts)
         for r in result:
             for w in r:
                 yield w
+
+
+def lcut(sentence, HMM=True):
+    return list(cut(sentence, HMM))
diff --git a/test/demo.py b/test/demo.py
index 84377ae..6ebb159 100644
--- a/test/demo.py
+++ b/test/demo.py
@@ -4,6 +4,12 @@ import sys
 sys.path.append("../")
 
 import jieba
+import jieba.posseg
+import jieba.analyse
+
+print('='*40)
+print('1. 分词')
+print('-'*40)
 
 seg_list = jieba.cut("我来到北京清华大学", cut_all=True)
 print("Full Mode: " + "/ ".join(seg_list))  # 全模式
@@ -16,3 +22,63 @@ print(", ".join(seg_list))
 
 seg_list = jieba.cut_for_search("小明硕士毕业于中国科学院计算所，后在日本京都大学深造")  # 搜索引擎模式
 print(", ".join(seg_list))
+
+print('='*40)
+print('2. 添加自定义词典/调整词典')
+print('-'*40)
+
+print('/'.join(jieba.cut('如果放到post中将出错。', HMM=False)))
+#如果/放到/post/中将/出错/。
+print(jieba.suggest_freq(('中', '将'), True))
+#494
+print('/'.join(jieba.cut('如果放到post中将出错。', HMM=False)))
+#如果/放到/post/中/将/出错/。
+print('/'.join(jieba.cut('「台中」正确应该不会被切开', HMM=False)))
+#「/台/中/」/正确/应该/不会/被/切开
+print(jieba.suggest_freq('台中', True))
+#69
+print('/'.join(jieba.cut('「台中」正确应该不会被切开', HMM=False)))
+#「/台中/」/正确/应该/不会/被/切开
+
+print('='*40)
+print('3. 关键词提取')
+print('-'*40)
+print(' TF-IDF')
+print('-'*40)
+
+s = "此外，公司拟对全资子公司吉林欧亚置业有限公司增资4.3亿元，增资后，吉林欧亚置业注册资本由7000万元增加到5亿元。吉林欧亚置业主要经营范围为房地产开发及百货零售等业务。目前在建吉林欧亚城市商业综合体项目。2013年，实现营业收入0万元，实现净利润-139.13万元。"
+for x, w in jieba.analyse.extract_tags(s, withWeight=True):
+    print('%s %s' % (x, w))
+
+print('-'*40)
+print(' TextRank')
+print('-'*40)
+
+for x, w in jieba.analyse.textrank(s, withWeight=True):
+    print('%s %s' % (x, w))
+
+print('='*40)
+print('4. 词性标注')
+print('-'*40)
+
+words = jieba.posseg.cut("我爱北京天安门")
+for w in words:
+    print('%s %s' % (w.word, w.flag))
+
+print('='*40)
+print('6. Tokenize: 返回词语在原文的起止位置')
+print('-'*40)
+print(' 默认模式')
+print('-'*40)
+
+result = jieba.tokenize('永和服装饰品有限公司')
+for tk in result:
+    print("word %s\t\t start: %d \t\t end:%d" % (tk[0],tk[1],tk[2]))
+
+print('-'*40)
+print(' 搜索模式')
+print('-'*40)
+
+result = jieba.tokenize('永和服装饰品有限公司', mode='search')
+for tk in result:
+    print("word %s\t\t start: %d \t\t end:%d" % (tk[0],tk[1],tk[2]))
diff --git a/test/test_lock.py b/test/test_lock.py
new file mode 100644
index 0000000..b7fcc97
--- /dev/null
+++ b/test/test_lock.py
@@ -0,0 +1,42 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+import jieba
+import threading
+
+def inittokenizer(tokenizer, group):
+	print('===> Thread %s:%s started' % (group, threading.current_thread().ident))
+	tokenizer.initialize()
+	print('<=== Thread %s:%s finished' % (group, threading.current_thread().ident))
+
+tokrs1 = [jieba.Tokenizer() for n in range(5)]
+tokrs2 = [jieba.Tokenizer('../extra_dict/dict.txt.small') for n in range(5)]
+
+thr1 = [threading.Thread(target=inittokenizer, args=(tokr, 1)) for tokr in tokrs1]
+thr2 = [threading.Thread(target=inittokenizer, args=(tokr, 2)) for tokr in tokrs2]
+for thr in thr1:
+	thr.start()
+for thr in thr2:
+	thr.start()
+for thr in thr1:
+	thr.join()
+for thr in thr2:
+	thr.join()
+
+del tokrs1, tokrs2
+
+print('='*40)
+
+tokr1 = jieba.Tokenizer()
+tokr2 = jieba.Tokenizer('../extra_dict/dict.txt.small')
+
+thr1 = [threading.Thread(target=inittokenizer, args=(tokr1, 1)) for n in range(5)]
+thr2 = [threading.Thread(target=inittokenizer, args=(tokr2, 2)) for n in range(5)]
+for thr in thr1:
+	thr.start()
+for thr in thr2:
+	thr.start()
+for thr in thr1:
+	thr.join()
+for thr in thr2:
+	thr.join()