jieba

mirror of https://github.com/fxsjy/jieba.git synced 2025-07-24 00:00:05 +08:00

Author	SHA1	Message	Date
jerryday	e5e41a4aad	fix pair object in dict problem	2015-10-30 16:38:50 +08:00
jerryday	4f8ca83661	add a withFlag param in textrank	2015-10-30 15:40:41 +08:00
jerryday	26e339f8f7	add a withFlag param to extract_tags	2015-10-30 11:09:24 +08:00
fxsjy	cb414cb861	version update	2015-06-27 16:49:44 +08:00
Dingyuan Wang	66fe17517d	prevent moving across different filesystems at tempfile.mkstemp	2015-06-26 22:12:39 +08:00
Dingyuan Wang	be46ddef9a	use shutil.move for all platforms in case of different filesystems	2015-06-26 21:52:53 +08:00
Dingyuan Wang	ceb5c26be4	fix self.FREQ in cut_for_search; make pair object iterable	2015-06-01 14:36:38 +08:00
Dingyuan Wang	3b76328f2a	allow ignoring word frequency while providing pos tag	2015-05-23 21:51:00 +08:00
Dingyuan Wang	94840a734c	wraps most globals in classes API changes: * class jieba.Tokenizer, jieba.posseg.POSTokenizer * class jieba.analyse.TFIDF, jieba.analyse.TextRank * global functions are mapped to jieba.(posseg.)dt, the default (POS)Tokenizer * multiprocessing only works with jieba.(posseg.)dt * new lcut, lcut_for_search functions that returns a list * jieba.analyse.textrank now returns 20 items by default Tests: * added test_lock.py to test multithread locking * demo.py now contains most of the examples in README	2015-05-09 21:29:05 +08:00
Gilbert Liu	f6e57ab2ae	fixed an error in load_userdict()	2015-05-01 12:52:28 -07:00
fxsjy	29d2b838dc	a minor version on pypi, which removes *.pyc	2015-04-17 19:35:12 +08:00
Wang Bin	84ffa0d4bf	exlucde word fragments from FREQ	2015-04-02 11:06:55 +08:00
fxsjy	89481cfd84	version update 0.36	2015-03-20 11:00:55 +08:00
Dingyuan Wang	4a552ca94f	suggest word frequency, support passing str to add_word	2015-03-14 12:44:19 +08:00
Yuan-Yi Chang	62433a3205	讓 jieba 可以自行指定 cache_file 產生的目錄位置，提供 jieba 在 Read-only file system 環境中運行 1.在呼叫 jieba.cut() 等相關動作前，先透過 jieba.tmp_dir 指定目錄位置 2.當應用環境為 Read-Only File System，可透過預先產生 cache_file 的機制，讓 jieba 正常運行 3.實際案例為 Google App Engine 和 Heroku，其中前者免費版僅 128MB 記憶體空間無法運行，後者免費環境有 512MB 可正常運行。發佈前，先在本地端產生 cache_file 後，連同 cache_file 一併發佈至 Google App Engine 或 Heroku 環境上即可使用。	2015-02-27 17:25:59 +08:00
Dingyuan Wang	f29430f49e	details in textrank; update README	2015-02-16 21:25:55 +08:00
zhangcheng	01b7f6efcf	improve some details from other commiters' adivces	2015-02-16 20:35:45 +08:00
zhangcheng	8b8c6c85d0	remove unusage import	2015-02-16 15:51:05 +08:00
zhangcheng	a6d1b2479e	build stable sort for graph iteration, then we can get stable result and adatpe details for python 3~	2015-02-16 15:49:10 +08:00
zhangcheng	1152db7736	build stable sort for graph iteration, then we can get stable result.	2015-02-16 15:46:36 +08:00
fxsjy	49657c976d	make extract_tags behavior compatiable with previous version	2015-02-14 21:23:58 +08:00
fxsjy	abcaf3e475	fix bug: load_userdict	2015-02-14 19:56:38 +08:00
Jack	a06b7d388e	fix bug in __main__.py	2015-02-12 14:08:39 +08:00
Dingyuan Wang	f2b7183a71	use str.splitlines to avoid losing line breaks	2015-02-12 12:39:14 +08:00
Dingyuan Wang	f808ea0ebb	use only one dict to store words and prefixes	2015-02-12 10:31:52 +08:00
Dingyuan Wang	32a0e92a09	don't compile re every time; autopep8	2015-02-10 21:22:34 +08:00
Dingyuan Wang	22bcf8be7a	Merge master and jieba3k, make the code Python 2/3 compatible	2015-02-10 20:54:55 +08:00
Dingyuan Wang	4197dfb8fa	store int directly in FREQ; small improvements	2015-02-09 16:26:00 +08:00
Dingyuan Wang	765fd6b7f0	store int directly in FREQ; small improvements	2015-02-09 16:14:12 +08:00
Dingyuan Wang	7bcb128f5f	fix textrank divided by zero; fix posseg.pair.__repr__	2014-12-20 00:12:42 +08:00
Lin	fea3aec6bd	Fix divided by zero issue in case of words are not found in dict.	2014-12-05 17:13:12 +08:00
Dingyuan Wang	c6b386f65b	update jieba3k	2014-11-29 16:06:20 +08:00
Dingyuan Wang	7b7c6955a9	complete the setup.py, fix #202 problem in posseg	2014-11-29 15:33:42 +08:00
Nomaka	9cb76dd8b9	Update __init__.py calc的idx参数没用	2014-11-18 16:00:49 +08:00
walkskyer	a336e26403	为函数textrank增加参数allowPOS，并修改extract_tags的参数allowPOS与textrank保持一致。	2014-11-15 18:36:09 +08:00
walkskyer	bab5f362ba	将exstract_tags参数allowPOS转换为frozenset以减少查找时间。	2014-11-15 18:14:47 +08:00
fxsjy	447c1ded8c	fix problem for python3.2	2014-11-15 13:44:30 +08:00
walkskyer	d82d2c18df	为关键字提取函数增加词性过滤功能	2014-11-13 22:26:22 +08:00
fxsjy	315a411e52	version update	2014-11-13 10:43:43 +08:00
walkskyer	5571a0337a	修复stop words处理未考虑"\r"导致不能正常匹配的问题。	2014-11-12 22:33:27 +08:00
Dingyuan Wang	7a6caa0c3c	port extract_tags, etc to jieba3k; add auto2to3 script	2014-11-07 23:33:31 +08:00
Dingyuan Wang	751ff35eb5	improve extract_tags; unify extract_tags and testrank	2014-10-31 23:15:51 +08:00
Dingyuan Wang	e3f3dcccba	improve the loading and caching process	2014-10-31 21:56:09 +08:00
Dingyuan Wang	fd9f1f2c0e	update README, textrank, etc.	2014-10-25 14:23:37 +08:00
Dingyuan Wang	a6119cc995	add custom dictionary to __main__; update README; slightly optimize textrank	2014-10-25 12:59:36 +08:00
zhangcheng	6eb9f6149c	add a simple implementation of textrank	2014-10-24 21:15:54 +08:00
fxsjy	f5ca87e088	merge change of @fukuball	2014-10-23 15:59:08 +08:00
fxsjy	ba87fcb01f	remove trie, use prefix set instead	2014-10-20 14:08:09 +08:00
fxsjy	82bfffb6ed	version update to 0.34	2014-10-20 13:35:13 +08:00
Dingyuan Wang	bb1e6000c6	fix version; fix spaces at end of line	2014-10-19 10:57:46 +08:00

1 2 3 4 5

211 Commits