Elmi Ahmadov
d93dda397c
avoid implicit namespaces
...
This PR fixes the ambigious `partial_sort` in KeywordExtractor.hpp.
We also have a definition for it and the compiler is consufed which
implementation should be used. To fix it, we can use the `std` namespace
explicitly.
Also, use the `std` namespace for the other data structures and include
their headers.
2025-04-10 19:10:05 +02:00
Elmi Ahmadov
588860b5b6
fix missing includes and make namespaces explicit
2025-04-10 16:11:20 +02:00
yanyiwu
016fc17575
Improve error logging for UTF-8 decoding failures across cppjieba components. Updated error messages in DictTrie, PosTagger, PreFilter, and SegmentBase to provide clearer context on the specific input causing the failure. This change enhances the debugging experience when handling UTF-8 encoded strings.
2024-12-08 17:26:28 +08:00
yanyiwu
42a93a4b98
Refactor decoding functions to use UTF-8 compliant methods
...
Updated multiple files to replace instances of DecodeRunesInString with DecodeUTF8RunesInString, ensuring proper handling of UTF-8 encoded strings. This change enhances the robustness of string decoding across the cppjieba library, including updates in DictTrie, HMMModel, PosTagger, PreFilter, SegmentBase, and Unicode files. Additionally, corresponding unit tests have been modified to reflect these changes.
2024-12-08 16:46:24 +08:00
yanyiwu
732812cdfb
class Jieba: support default dictpath
2024-09-22 09:38:31 +08:00
yanyiwu
cc58d4f858
DictTrie: removed unused var
2024-09-21 21:29:55 +08:00
wuyanyi
03cc7c39ff
feature: add RemoveWord api from https://github.com/yanyiwu/gojieba/pull/99
2022-10-16 13:17:19 +08:00
Yanyi Wu
8a258dfaf4
Merge pull request #127 from byronhe/patch-2
...
remove duplicate #include
2019-09-15 16:54:42 +08:00
byronhe
55a94b417c
fix typo
2019-09-04 20:50:11 +08:00
byronhe
6444f4b226
fix compile warning
2019-04-29 12:18:03 +08:00
byronhe
798b7b81c9
remove duplicate #include
...
remove duplicate #include
2019-03-15 15:48:09 +08:00
zhoupeng
111fb007cf
exposes InsertUserWord Find
2018-06-09 16:21:13 +08:00
zhoupeng
1e1e585194
LoadUserDict by set,vector
2018-06-08 14:23:01 +08:00
zhoupeng
1066bc085e
fix input type ,expose to Jieba
2018-06-08 01:32:47 +08:00
zhoupeng
d56e5c0659
InsertUserWord with freq arg,expose InserUserDictNode with vector<string> arg
2018-06-08 00:44:33 +08:00
Wangzhe
e7602afaac
减少Visual Studio编译器警告
2017-06-27 23:00:31 +08:00
Roy Guo
f74d716570
Add Unicode offset/length support for Word
2016-10-16 13:05:56 +08:00
Roy Guo
a2f75a00d3
Add Unicode offset/length support for Word
2016-10-16 12:52:50 +08:00
yanyiwu
74c70c70cd
create keyword_extract in Jieba
2016-09-11 21:42:53 +08:00
yanyiwu
4a755dff6a
may be more friendly for compiler
2016-08-11 00:00:20 +08:00
yanyiwu
53bc279dea
fix compiler warning
2016-07-23 20:49:27 +08:00
yanyiwu
0984c9ed3f
update user dict loading method about word weight, and add unit tests
2016-07-22 23:53:49 +08:00
npes87184
0c3cf04b43
fix second element parse error in dict
2016-07-22 10:19:28 +08:00
bigelephant29
986106a553
change stoi to atoi
2016-07-21 10:54:08 +08:00
bigelephant29
2e1b6e0443
user dict support user weight and user tag
2016-07-21 10:38:46 +08:00
bigelephant29
b82acaf71e
fix user dict tag bug : wrong buf index assigned
2016-07-21 10:06:24 +08:00
t-k-
5775a40bee
Add LookupTag function for single token tag lookup.
2016-07-06 02:44:56 -06:00
Jaimin Pan
ce8cafe54a
add tag capbility for each segments
2016-06-27 18:10:42 +08:00
yanyiwu
c425bcc49f
add Jieba::ResetSeparators api and unittest
2016-05-09 22:49:51 +08:00
yanyiwu
6e3ecec599
improve readability
2016-05-09 22:09:57 +08:00
yanyiwu
0a23d6b268
merge questionfish/master
2016-05-04 19:27:05 +08:00
mayunyun
d5a52a8e7b
1. remove stopword from span windows
...
2. update unittest
2016-05-04 17:52:30 +08:00
yanyiwu
5c739484ae
merge the latest codes in master branch, and update unittest cases to pass ci
2016-05-03 23:20:03 +08:00
yanyiwu
f253db0133
use map/set instead of unordered_map/unordered_set to make result stable
2016-05-03 21:24:40 +08:00
Yanyi Wu
6d105a864d
Update TextRankExtractor.hpp
...
remove unused function which using c++11 keyword `auto`
2016-05-03 19:53:40 +08:00
mayunyun
0f66a923b3
1.增加单元测试
...
2.增加了构造函数的重载,增加了提取函数的重载
2016-05-03 18:06:14 +08:00
mayunyun
f2de41c15e
code layout change: tab -> space
2016-05-03 09:03:16 +08:00
yanyiwu
5ac9e48eb0
rewrite QuerySegment, make Jieba::CutForSearch
behaves the same as [jieba] cut_for_search
api
...
remove Jieba::SetQuerySegmentThreshold
2016-05-02 16:18:36 +08:00
mayunyun
1aa0a32d90
code format check
2016-04-25 20:28:47 +08:00
mayunyun
669e971e3e
new file: include/cppjieba/TextRankExtractor.hpp
...
Add TextRank Keyword Extractor to JiebaCpp
新增TextRank关键词提取
2016-04-25 20:20:50 +08:00
yanyiwu
3befc42697
update KeywordExtractor::Word's printing format to json format
2016-04-19 16:00:53 +08:00
yanyiwu
29e085904d
add log and unittest
2016-04-18 14:55:42 +08:00
yanyiwu
63e9c94fb7
add unicode decoding unittest
2016-04-18 14:37:17 +08:00
yanyiwu
6fa843b527
override Cut functions, add location information into Word results;
2016-04-17 23:39:57 +08:00
yanyiwu
b6703aba90
use offset instead of str in RuneStr
2016-04-17 22:50:32 +08:00
yanyiwu
e7a45d2dde
remove LevelSegment
2016-04-17 22:23:00 +08:00
yanyiwu
42a73eeb64
make compiler happy
2016-04-17 22:11:58 +08:00
yanyiwu
dcced8561e
remove namespace unicode
2016-04-17 21:59:10 +08:00
yanyiwu
6ff6fe1430
WordRange construct
2016-04-17 21:57:36 +08:00
yanyiwu
339e3ca772
big change: add RuneStr for the position of word in string
2016-04-17 17:30:05 +08:00