yanyiwu
42a93a4b98
Refactor decoding functions to use UTF-8 compliant methods
...
Updated multiple files to replace instances of DecodeRunesInString with DecodeUTF8RunesInString, ensuring proper handling of UTF-8 encoded strings. This change enhances the robustness of string decoding across the cppjieba library, including updates in DictTrie, HMMModel, PosTagger, PreFilter, SegmentBase, and Unicode files. Additionally, corresponding unit tests have been modified to reflect these changes.
2024-12-08 16:46:24 +08:00
yanyiwu
aa1def5ddb
class Jiaba unittest add default argument input
2024-09-22 09:43:04 +08:00
yanyiwu
a110ab10cc
[cmake] fetch googletest
2024-08-16 10:13:07 +08:00
yanyiwu
74c70c70cd
create keyword_extract in Jieba
2016-09-11 21:42:53 +08:00
yanyiwu
0984c9ed3f
update user dict loading method about word weight, and add unit tests
2016-07-22 23:53:49 +08:00
Jaimin Pan
ce8cafe54a
add tag capbility for each segments
2016-06-27 18:10:42 +08:00
sooda
7d503e4b13
fix unittest cmake macro bug
2016-06-08 10:38:20 +08:00
yanyiwu
c425bcc49f
add Jieba::ResetSeparators api and unittest
2016-05-09 22:49:51 +08:00
yanyiwu
b355e9f487
update unittest to pass 'make test'
2016-05-04 19:33:05 +08:00
yanyiwu
5c739484ae
merge the latest codes in master branch, and update unittest cases to pass ci
2016-05-03 23:20:03 +08:00
yanyiwu
f253db0133
use map/set instead of unordered_map/unordered_set to make result stable
2016-05-03 21:24:40 +08:00
yanyiwu
39316114c5
correct unittest case
2016-05-03 20:49:47 +08:00
yanyiwu
a1ea1d0757
add textrank unittest into cmake
2016-05-03 20:01:44 +08:00
mayunyun
0f66a923b3
1.增加单元测试
...
2.增加了构造函数的重载,增加了提取函数的重载
2016-05-03 18:06:14 +08:00
yanyiwu
5ac9e48eb0
rewrite QuerySegment, make Jieba::CutForSearch
behaves the same as [jieba] cut_for_search
api
...
remove Jieba::SetQuerySegmentThreshold
2016-05-02 16:18:36 +08:00
qinwf
c84594f620
add Windows CI with MSVC
2016-04-27 17:45:48 +08:00
yanyiwu
3befc42697
update KeywordExtractor::Word's printing format to json format
2016-04-19 16:00:53 +08:00
yanyiwu
29e085904d
add log and unittest
2016-04-18 14:55:42 +08:00
yanyiwu
63e9c94fb7
add unicode decoding unittest
2016-04-18 14:37:17 +08:00
yanyiwu
6fa843b527
override Cut functions, add location information into Word results;
2016-04-17 23:39:57 +08:00
yanyiwu
b6703aba90
use offset instead of str in RuneStr
2016-04-17 22:50:32 +08:00
yanyiwu
e7a45d2dde
remove LevelSegment
2016-04-17 22:23:00 +08:00
yanyiwu
dcced8561e
remove namespace unicode
2016-04-17 21:59:10 +08:00
yanyiwu
339e3ca772
big change: add RuneStr for the position of word in string
2016-04-17 17:30:05 +08:00
yanyiwu
c19736995c
Add KeywordExtractor::Word and add more overrided KeywordExtractor::Extract
2016-03-26 22:12:40 +08:00
yanyiwu
0a7b6e62f3
add Unicode32 cases for cut testing
2016-02-18 15:18:35 +08:00
yanyiwu
e6454fef77
use HashMap in Trie, and remove the base array of trie root node, see details in Changelog
2016-02-12 01:37:39 +08:00
yanyiwu
3c5ad24260
source code layout change:
...
1. src/ -> include/cppjieba/
2. src/limonp/ -> deps/limonp/
3. server/husky -> deps/husky/
4. test/unittest/gtest -> deps/gtest
2016-01-11 14:25:02 +08:00
yanyiwu
bcb112a4b1
upgrade basic functions
2015-12-12 21:25:57 +08:00
yanyiwu
8482bef442
change multi user dicts seperator from ':' to '|;'
2015-12-09 00:01:27 +08:00
yanyiwu
8dc01ae614
add Jieba::Locate function to get word location of cutted sentence
2015-12-02 01:19:23 +08:00
yanyiwu
c3fd357a6d
[QuerySegment] add SetMaxWordLen,GetMaxWordLen, and filter the english sentence in secondary Cut
2015-10-29 14:23:01 +08:00
yanyiwu
83cc67cb15
[code style] uppercase function name
2015-10-29 12:39:10 +08:00
yanyiwu
6f51373280
support optional user word freq weight
2015-10-09 11:20:06 +08:00
yanyiwu
ecacf118e6
[code style] lower case namespace
2015-10-08 21:13:11 +08:00
yanyiwu
16b69e35c1
delete Application.hpp, use Jieba.hpp instead
2015-10-08 21:03:09 +08:00
yanyiwu
4d56be920b
support optional user word freq weight
2015-10-08 20:05:27 +08:00
yanyiwu
b28d6db574
code style
2015-10-08 17:08:57 +08:00
yanyiwu
5bf7454ad2
add multi user dict unittest
2015-09-25 16:07:01 +08:00
yanyiwu
ea4d81cde7
add segment cut case
2015-09-18 14:28:34 +08:00
yanyiwu
eb6f47b6b0
refactor unittest
2015-09-13 18:09:56 +08:00
yanyiwu
8eef9a13a8
fix bug about optional argument hmm
2015-09-13 18:06:44 +08:00
yanyiwu
14974d51b4
abondom ISegment
2015-09-13 17:02:04 +08:00
yanyiwu
e9241d9025
fixed the bug in the last commit
2015-09-13 16:18:48 +08:00
yanyiwu
28bcb3bf57
use PreFilter in SegmentBase
2015-09-13 16:05:17 +08:00
yanyiwu
0542dd1cfd
add PreFilter
2015-09-13 15:10:10 +08:00
yanyiwu
1babe57ebc
细粒度分词功能
2015-08-30 16:35:21 +08:00
yanyiwu
3c60c35906
修复FullSegment对于有些单字没有输出的bug
2015-08-30 13:09:37 +08:00
yanyiwu
001a69d8c6
增加MPSegment的细粒度分词功能。
2015-08-30 01:04:30 +08:00
yanyiwu
0e0318f6ad
集成LevelSegment进Application
2015-08-11 11:57:58 +08:00