62 Commits

Author SHA1 Message Date
Elmi Ahmadov
d93dda397c avoid implicit namespaces
This PR fixes the ambigious `partial_sort` in KeywordExtractor.hpp.
We also have a definition for it and the compiler is consufed which
implementation should be used. To fix it, we can use the `std` namespace
explicitly.

Also, use the `std` namespace for the other data structures and include
their headers.
2025-04-10 19:10:05 +02:00
Elmi Ahmadov
588860b5b6 fix missing includes and make namespaces explicit 2025-04-10 16:11:20 +02:00
yanyiwu
016fc17575 Improve error logging for UTF-8 decoding failures across cppjieba components. Updated error messages in DictTrie, PosTagger, PreFilter, and SegmentBase to provide clearer context on the specific input causing the failure. This change enhances the debugging experience when handling UTF-8 encoded strings. 2024-12-08 17:26:28 +08:00
yanyiwu
42a93a4b98 Refactor decoding functions to use UTF-8 compliant methods
Updated multiple files to replace instances of DecodeRunesInString with DecodeUTF8RunesInString, ensuring proper handling of UTF-8 encoded strings. This change enhances the robustness of string decoding across the cppjieba library, including updates in DictTrie, HMMModel, PosTagger, PreFilter, SegmentBase, and Unicode files. Additionally, corresponding unit tests have been modified to reflect these changes.
2024-12-08 16:46:24 +08:00
yanyiwu
732812cdfb class Jieba: support default dictpath 2024-09-22 09:38:31 +08:00
yanyiwu
cc58d4f858 DictTrie: removed unused var 2024-09-21 21:29:55 +08:00
wuyanyi
03cc7c39ff feature: add RemoveWord api from https://github.com/yanyiwu/gojieba/pull/99 2022-10-16 13:17:19 +08:00
Yanyi Wu
8a258dfaf4
Merge pull request #127 from byronhe/patch-2
remove duplicate #include
2019-09-15 16:54:42 +08:00
byronhe
55a94b417c
fix typo 2019-09-04 20:50:11 +08:00
byronhe
6444f4b226
fix compile warning 2019-04-29 12:18:03 +08:00
byronhe
798b7b81c9
remove duplicate #include
remove duplicate #include
2019-03-15 15:48:09 +08:00
zhoupeng
111fb007cf exposes InsertUserWord Find 2018-06-09 16:21:13 +08:00
zhoupeng
1e1e585194 LoadUserDict by set,vector 2018-06-08 14:23:01 +08:00
zhoupeng
1066bc085e fix input type ,expose to Jieba 2018-06-08 01:32:47 +08:00
zhoupeng
d56e5c0659 InsertUserWord with freq arg,expose InserUserDictNode with vector<string> arg 2018-06-08 00:44:33 +08:00
Wangzhe
e7602afaac 减少Visual Studio编译器警告 2017-06-27 23:00:31 +08:00
Roy Guo
f74d716570 Add Unicode offset/length support for Word 2016-10-16 13:05:56 +08:00
Roy Guo
a2f75a00d3 Add Unicode offset/length support for Word 2016-10-16 12:52:50 +08:00
yanyiwu
74c70c70cd create keyword_extract in Jieba 2016-09-11 21:42:53 +08:00
yanyiwu
4a755dff6a may be more friendly for compiler 2016-08-11 00:00:20 +08:00
yanyiwu
53bc279dea fix compiler warning 2016-07-23 20:49:27 +08:00
yanyiwu
0984c9ed3f update user dict loading method about word weight, and add unit tests 2016-07-22 23:53:49 +08:00
npes87184
0c3cf04b43 fix second element parse error in dict 2016-07-22 10:19:28 +08:00
bigelephant29
986106a553 change stoi to atoi 2016-07-21 10:54:08 +08:00
bigelephant29
2e1b6e0443 user dict support user weight and user tag 2016-07-21 10:38:46 +08:00
bigelephant29
b82acaf71e fix user dict tag bug : wrong buf index assigned 2016-07-21 10:06:24 +08:00
t-k-
5775a40bee Add LookupTag function for single token tag lookup. 2016-07-06 02:44:56 -06:00
Jaimin Pan
ce8cafe54a add tag capbility for each segments 2016-06-27 18:10:42 +08:00
yanyiwu
c425bcc49f add Jieba::ResetSeparators api and unittest 2016-05-09 22:49:51 +08:00
yanyiwu
6e3ecec599 improve readability 2016-05-09 22:09:57 +08:00
yanyiwu
0a23d6b268 merge questionfish/master 2016-05-04 19:27:05 +08:00
mayunyun
d5a52a8e7b 1. remove stopword from span windows
2. update unittest
2016-05-04 17:52:30 +08:00
yanyiwu
5c739484ae merge the latest codes in master branch, and update unittest cases to pass ci 2016-05-03 23:20:03 +08:00
yanyiwu
f253db0133 use map/set instead of unordered_map/unordered_set to make result stable 2016-05-03 21:24:40 +08:00
Yanyi Wu
6d105a864d Update TextRankExtractor.hpp
remove unused function which using c++11 keyword `auto`
2016-05-03 19:53:40 +08:00
mayunyun
0f66a923b3 1.增加单元测试
2.增加了构造函数的重载,增加了提取函数的重载
2016-05-03 18:06:14 +08:00
mayunyun
f2de41c15e code layout change: tab -> space 2016-05-03 09:03:16 +08:00
yanyiwu
5ac9e48eb0 rewrite QuerySegment, make Jieba::CutForSearch behaves the same as [jieba] cut_for_search api
remove Jieba::SetQuerySegmentThreshold
2016-05-02 16:18:36 +08:00
mayunyun
1aa0a32d90 code format check 2016-04-25 20:28:47 +08:00
mayunyun
669e971e3e new file: include/cppjieba/TextRankExtractor.hpp
Add TextRank Keyword Extractor to JiebaCpp
新增TextRank关键词提取
2016-04-25 20:20:50 +08:00
yanyiwu
3befc42697 update KeywordExtractor::Word's printing format to json format 2016-04-19 16:00:53 +08:00
yanyiwu
29e085904d add log and unittest 2016-04-18 14:55:42 +08:00
yanyiwu
63e9c94fb7 add unicode decoding unittest 2016-04-18 14:37:17 +08:00
yanyiwu
6fa843b527 override Cut functions, add location information into Word results; 2016-04-17 23:39:57 +08:00
yanyiwu
b6703aba90 use offset instead of str in RuneStr 2016-04-17 22:50:32 +08:00
yanyiwu
e7a45d2dde remove LevelSegment 2016-04-17 22:23:00 +08:00
yanyiwu
42a73eeb64 make compiler happy 2016-04-17 22:11:58 +08:00
yanyiwu
dcced8561e remove namespace unicode 2016-04-17 21:59:10 +08:00
yanyiwu
6ff6fe1430 WordRange construct 2016-04-17 21:57:36 +08:00
yanyiwu
339e3ca772 big change: add RuneStr for the position of word in string 2016-04-17 17:30:05 +08:00