136 Commits

Author SHA1 Message Date
yanyiwu
42a93a4b98 Refactor decoding functions to use UTF-8 compliant methods
Updated multiple files to replace instances of DecodeRunesInString with DecodeUTF8RunesInString, ensuring proper handling of UTF-8 encoded strings. This change enhances the robustness of string decoding across the cppjieba library, including updates in DictTrie, HMMModel, PosTagger, PreFilter, SegmentBase, and Unicode files. Additionally, corresponding unit tests have been modified to reflect these changes.
2024-12-08 16:46:24 +08:00
yanyiwu
aa1def5ddb class Jiaba unittest add default argument input 2024-09-22 09:43:04 +08:00
yanyiwu
a110ab10cc [cmake] fetch googletest 2024-08-16 10:13:07 +08:00
yanyiwu
74c70c70cd create keyword_extract in Jieba 2016-09-11 21:42:53 +08:00
yanyiwu
0984c9ed3f update user dict loading method about word weight, and add unit tests 2016-07-22 23:53:49 +08:00
Jaimin Pan
ce8cafe54a add tag capbility for each segments 2016-06-27 18:10:42 +08:00
sooda
7d503e4b13 fix unittest cmake macro bug 2016-06-08 10:38:20 +08:00
yanyiwu
c425bcc49f add Jieba::ResetSeparators api and unittest 2016-05-09 22:49:51 +08:00
yanyiwu
b355e9f487 update unittest to pass 'make test' 2016-05-04 19:33:05 +08:00
yanyiwu
5c739484ae merge the latest codes in master branch, and update unittest cases to pass ci 2016-05-03 23:20:03 +08:00
yanyiwu
f253db0133 use map/set instead of unordered_map/unordered_set to make result stable 2016-05-03 21:24:40 +08:00
yanyiwu
39316114c5 correct unittest case 2016-05-03 20:49:47 +08:00
yanyiwu
a1ea1d0757 add textrank unittest into cmake 2016-05-03 20:01:44 +08:00
mayunyun
0f66a923b3 1.增加单元测试
2.增加了构造函数的重载,增加了提取函数的重载
2016-05-03 18:06:14 +08:00
yanyiwu
5ac9e48eb0 rewrite QuerySegment, make Jieba::CutForSearch behaves the same as [jieba] cut_for_search api
remove Jieba::SetQuerySegmentThreshold
2016-05-02 16:18:36 +08:00
qinwf
c84594f620 add Windows CI with MSVC 2016-04-27 17:45:48 +08:00
yanyiwu
3befc42697 update KeywordExtractor::Word's printing format to json format 2016-04-19 16:00:53 +08:00
yanyiwu
29e085904d add log and unittest 2016-04-18 14:55:42 +08:00
yanyiwu
63e9c94fb7 add unicode decoding unittest 2016-04-18 14:37:17 +08:00
yanyiwu
6fa843b527 override Cut functions, add location information into Word results; 2016-04-17 23:39:57 +08:00
yanyiwu
b6703aba90 use offset instead of str in RuneStr 2016-04-17 22:50:32 +08:00
yanyiwu
e7a45d2dde remove LevelSegment 2016-04-17 22:23:00 +08:00
yanyiwu
dcced8561e remove namespace unicode 2016-04-17 21:59:10 +08:00
yanyiwu
339e3ca772 big change: add RuneStr for the position of word in string 2016-04-17 17:30:05 +08:00
yanyiwu
c19736995c Add KeywordExtractor::Word and add more overrided KeywordExtractor::Extract 2016-03-26 22:12:40 +08:00
yanyiwu
0a7b6e62f3 add Unicode32 cases for cut testing 2016-02-18 15:18:35 +08:00
yanyiwu
e6454fef77 use HashMap in Trie, and remove the base array of trie root node, see details in Changelog 2016-02-12 01:37:39 +08:00
yanyiwu
3c5ad24260 source code layout change:
1. src/ -> include/cppjieba/
2. src/limonp/ -> deps/limonp/
3. server/husky -> deps/husky/
4. test/unittest/gtest -> deps/gtest
2016-01-11 14:25:02 +08:00
yanyiwu
bcb112a4b1 upgrade basic functions 2015-12-12 21:25:57 +08:00
yanyiwu
8482bef442 change multi user dicts seperator from ':' to '|;' 2015-12-09 00:01:27 +08:00
yanyiwu
8dc01ae614 add Jieba::Locate function to get word location of cutted sentence 2015-12-02 01:19:23 +08:00
yanyiwu
c3fd357a6d [QuerySegment] add SetMaxWordLen,GetMaxWordLen, and filter the english sentence in secondary Cut 2015-10-29 14:23:01 +08:00
yanyiwu
83cc67cb15 [code style] uppercase function name 2015-10-29 12:39:10 +08:00
yanyiwu
6f51373280 support optional user word freq weight 2015-10-09 11:20:06 +08:00
yanyiwu
ecacf118e6 [code style] lower case namespace 2015-10-08 21:13:11 +08:00
yanyiwu
16b69e35c1 delete Application.hpp, use Jieba.hpp instead 2015-10-08 21:03:09 +08:00
yanyiwu
4d56be920b support optional user word freq weight 2015-10-08 20:05:27 +08:00
yanyiwu
b28d6db574 code style 2015-10-08 17:08:57 +08:00
yanyiwu
5bf7454ad2 add multi user dict unittest 2015-09-25 16:07:01 +08:00
yanyiwu
ea4d81cde7 add segment cut case 2015-09-18 14:28:34 +08:00
yanyiwu
eb6f47b6b0 refactor unittest 2015-09-13 18:09:56 +08:00
yanyiwu
8eef9a13a8 fix bug about optional argument hmm 2015-09-13 18:06:44 +08:00
yanyiwu
14974d51b4 abondom ISegment 2015-09-13 17:02:04 +08:00
yanyiwu
e9241d9025 fixed the bug in the last commit 2015-09-13 16:18:48 +08:00
yanyiwu
28bcb3bf57 use PreFilter in SegmentBase 2015-09-13 16:05:17 +08:00
yanyiwu
0542dd1cfd add PreFilter 2015-09-13 15:10:10 +08:00
yanyiwu
1babe57ebc 细粒度分词功能 2015-08-30 16:35:21 +08:00
yanyiwu
3c60c35906 修复FullSegment对于有些单字没有输出的bug 2015-08-30 13:09:37 +08:00
yanyiwu
001a69d8c6 增加MPSegment的细粒度分词功能。 2015-08-30 01:04:30 +08:00
yanyiwu
0e0318f6ad 集成LevelSegment进Application 2015-08-11 11:57:58 +08:00