Compare commits

...

168 Commits

Author SHA1 Message Date
medcl
9338c19104 update to 8.4.1 2022-09-02 18:44:03 +08:00
Medcl
0fb53ac32c
Update pom.xml
Update log4j
2022-01-19 11:59:06 +08:00
medcl
b637708ba0 update log4j 2021-12-13 09:45:53 +08:00
medcl
9c47725ea0 update for 7.14 2021-08-04 17:19:10 +08:00
Medcl
8e36b3240e
Update FUNDING.yml 2021-05-19 17:27:37 +08:00
Medcl
e0157d5f39
Update FUNDING.yml 2021-05-19 17:27:04 +08:00
Medcl
0fccc038e2
Create FUNDING.yml 2021-05-19 16:50:12 +08:00
Jack
5a1b8c8da6
Read chunked remote words (#817)
Fix chunked content could not be read as it will not get content length
I see there is an issue #780 and this fix it
2020-09-06 16:34:40 +08:00
medcl
1375ca6d39 fix #789 2020-06-10 16:05:01 +08:00
Howard
4619effa15 transfer log message from chinese to english (#746) 2019-12-19 15:31:04 +08:00
medcl
5f53f1a5bf Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2019-10-07 19:01:51 +08:00
medcl
904a7493ea update to 7.4.0 2019-10-07 19:01:29 +08:00
zhipingpan
06e8a23d18 Update AnalyzeContext.java (#673) 2019-05-01 16:57:44 +08:00
Hongliang Wang
a1d6ba8ca2 Correct Search Analyzer (#668)
The former search analyzer `ik-max-word` will give the wrong result against described later in the README file.
2019-04-19 20:23:43 +08:00
medcl
90c9b58354 update example 2019-04-11 10:07:22 +08:00
medcl
ba8bb85f31 update to support 7.x 2019-04-11 09:35:19 +08:00
medcl
125ac3c5e5 Merge branch 'pr/621' 2019-03-25 11:02:22 +08:00
medcl
f0dd522e60 Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2019-03-25 10:42:23 +08:00
medcl
9eaa2b90eb fix NPE 2019-03-06 19:02:56 +08:00
pengcong90
9873489ba7
Update AnalyzeContext.java
使用ik_smart切分 金力泰合同审批 切分的结果是(金  力  泰  合同  审批)但是使用ik_max_word切分结果是(金  力  泰合  合同  审批 批),这样就存在搜索(金力泰  金力泰合同审批) 搜索不到的情况,查看源码发现泰未在字典中,泰合  合同在字典中,导致smart切分消歧的时候按照逆向概率高的规则忽略了泰合,输出结果泰就单独切分了,可以在输出结果时判断下 字典中无单字,但是词元冲突了,切分出相交词元的前一个词元中的单字,这样就能解决这个问题
2018-11-21 11:00:29 +08:00
杨晓东
949531572b 修改适配elasticsearch 6.5.0 (#615)
Signed-off-by: 杨晓东 <03131302@163.com>
2018-11-20 13:36:39 +00:00
byronhe
1d750a9bdd Update AnalyzeContext.java (#617) 2018-11-20 13:15:37 +00:00
黄松
3a7a81c29d Update README.md (#581) 2018-08-06 16:54:06 +08:00
medcl
1422d5b96c Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2018-07-26 10:25:44 +08:00
medcl
9fff6379ef remove deploy in travis 2018-07-26 10:25:16 +08:00
Rueian
5190dba198 Grant java.net.SocketPermission (#565) 2018-06-28 16:11:46 +08:00
wksw
83fa2ff8b2 直接解压到plugins目录下导致es无法启动 (#564)
直接解压到plugins会报Could not load plugin descriptor for plugin directory [plugin-descriptor.properties]错误
2018-06-26 10:14:02 +08:00
medcl
0222529290 Remove intermediate elasticsearch directory within plugin zips 2018-06-19 11:25:29 +08:00
medcl
5e8d0df2be update es to 6.3.0 2018-06-19 09:14:57 +08:00
medcl
36e6d2d00b update travis 2018-05-06 17:06:19 +08:00
medcl
de1da42d38 update travis 2018-05-06 16:55:07 +08:00
zj0713001
3dcedde9e4 update es to 6.2.4 (#545) 2018-05-04 16:31:35 +08:00
medcl
21a859a48d Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2018-04-09 15:59:12 +08:00
medcl
816b8ddd4b fix ambiguous 2018-04-09 15:58:43 +08:00
Figroc Chen
7028b9ea05 BOM handling of dict file (#517)
Signed-off-by: Peng Chen <figroc@gmail.com>
2018-04-02 13:29:22 +08:00
medcl
4ab2616a96 update es to 6.2.3 2018-04-02 12:24:48 +08:00
medcl
7c9b4771b3 Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2018-03-05 15:29:35 -08:00
medcl
22de5be444 update es to 6.2.2 2018-03-05 15:29:25 -08:00
Figroc Chen
0e8ddbd749 ext dic&stopwords can be dir (#437)
allow dir for ext_dict and ext_stopwords in IKAnalyzer.cfg.xml
By: Peng Chen<figroc@gmail.com>
2018-02-24 16:40:21 +08:00
medcl
7a1445fdda update plugin-descriptor.properties, Close #514 2018-02-10 14:07:20 +08:00
medcl
353cefd5b8 fix example, Close #512 2018-02-09 12:27:46 +08:00
medcl
0922152fb8 update es to 6.2.1 2018-02-09 12:12:22 +08:00
muliuyun
cc01c881af 更新ES到6.1.3 (#510)
* update es to 6.1.3
2018-02-09 11:44:56 +08:00
medcl
eb21c796d8 update es to 6.1.2 2018-01-19 17:32:41 +08:00
medcl
5828cb1c72 update es to 6.1.1 2017-12-20 09:59:49 +08:00
medcl
dc739d2cee update es to 6.0.1 2017-12-20 09:57:14 +08:00
medcl
2851cc2501 update es to 6.0.0 2017-11-15 20:04:49 +08:00
medcl
b32366489b update es to 5.6.4 2017-11-15 19:57:30 +08:00
medcl
6a55e3af76 update es to 5.6.3 2017-10-19 09:56:51 +02:00
medcl
7636e1a234 update es to 5.6.2 2017-10-19 09:51:39 +02:00
medcl
1f2dfbffd5 update es to 5.6.1 2017-09-19 15:35:31 +08:00
medcl
2541e35991 update es to 5.6.0 2017-09-15 10:53:29 +08:00
medcl
55a4f05666 update es to 5.5.3 2017-09-15 10:47:47 +08:00
medcl
6309787f94 update es to 5.5.2 2017-08-30 20:34:07 +08:00
medcl
c4c498a3aa update example 2017-08-03 17:10:30 +08:00
medcl
8da12f3492 update es to 5.5.1 2017-08-03 17:00:02 +08:00
medcl
50230bfa64 fix install by plugin command 2017-08-03 16:59:35 +08:00
杨晓东
adf282f115 修改pom.xml插件版本调整为5.5.0 (#401)
* 提交

Signed-off-by: 杨晓东 <03131302@163.com>

* Update README.md
2017-07-12 20:26:48 +08:00
medcl
1a62eb1651 update es to 5.4.3 2017-07-01 17:48:25 +08:00
medcl
455b672e5a update es to 5.4.2 2017-06-22 10:19:00 +08:00
medcl
2d16b56728 update es to 5.4.1 2017-05-16 10:03:55 +08:00
Zhang Yixin
1987d6ace4 update es to 5.4.0 (#369) 2017-05-16 09:48:03 +08:00
medcl
60e5e7768f update es to 5.3.2 2017-04-28 15:26:29 +08:00
medcl
7dfeb25c8f update readme 2017-04-07 21:11:22 +08:00
medcl
e7d968ffa8 update es to 5.3.0 2017-04-01 14:52:04 +08:00
medcl
a1fea66be8 update es to 5.2.2 2017-03-02 22:53:58 +08:00
medcl
dbb45eec56 update es to 5.2.1 2017-02-15 12:44:59 +08:00
medcl
c5a1553850 update oss version 2017-02-05 18:17:13 +08:00
medcl
400206511d update to es 5.2.0 2017-02-05 18:15:43 +08:00
medcl
d1d216a195 update es to 5.1.2 2017-01-19 10:18:34 +08:00
medcl
494576998a update es to 5.1.1 2016-12-13 17:33:10 +08:00
medcl
b85b487569 update es to v5.0.2 2016-11-30 09:33:29 +08:00
medcl
ffb88ee0fa update es to v5.0.1 2016-11-16 11:45:47 +08:00
medcl
e08d9d9be5 update es to 5.0.0 2016-10-27 16:10:56 +08:00
medcl
754572b2b9 update es to 5.0.0-rc1 2016-10-13 16:26:43 +08:00
medcl
e0ada4440e update README 2016-09-28 12:17:11 +02:00
medcl
17f6e982a5 Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2016-09-28 12:16:09 +02:00
medcl
b6ec9c0a00 update to support es5.0.0-beta1, Closes #282 2016-09-28 12:14:24 +02:00
Hsu Chen-Wei
7c92a10fc0 Add steps for installation (#268)
Tell user to switch tags before compiling.
2016-09-06 04:43:23 +03:00
medcl
f28ec3c3c2 update travis config 2016-08-23 11:39:19 +08:00
medcl
bfcebccd0f update readme 2016-08-23 00:33:52 +08:00
medcl
82c6369501 unify compiler plugin version 2016-08-18 16:28:22 +08:00
medcl
e637c4b1b2 update readme,pom.xml 2016-08-18 15:51:34 +08:00
medcl
ac2b78acd0 bump up compiler to use 1.8 2016-08-18 15:26:46 +08:00
medcl
168a798da8 support es 5.0.0-alpha5 2016-08-18 11:25:45 +08:00
medcl
efb393f3d7 Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2016-08-15 10:30:05 +08:00
medcl
2df086b082 support maven release 2016-08-15 10:29:54 +08:00
lixiaohui
b11aec33c2 update for elasticsearch 2.3.5 (#259) 2016-08-12 09:52:58 +08:00
medcl
7e86d7390a remove unused classes 2016-07-25 20:54:57 +08:00
medcl
341b586373 add config to enable/disable lowercase and remote_dict, Closes #241 2016-07-25 10:55:25 +08:00
medcl
b662596939 update to support es 2.3.4, Closes #236 2016-07-14 23:49:04 +08:00
tangyu
82432f1059 fix: Header ETag not exists will throw error: java.lang.NullPointerException #223 (#224) 2016-06-29 10:30:31 +08:00
Pengcheng Huang
4373cf7c94 JDK8 and Maven compability (#210)
Disable doclint
2016-06-13 17:41:01 +08:00
medcl
f1d59921fe Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2016-06-13 10:34:20 +08:00
medcl
c94ef3884a Fix maven empty assembly id, Closes #208 2016-06-13 10:34:01 +08:00
Robert LU
26fe905cc6 Also load config from /etc/elasticsearch/analysis-ik (#197)
Support install by `bin/plugin`, dealing with config files reallocation
2016-05-25 17:07:25 +08:00
medcl
7e29998ab9 update es to 2.3.3, Closes #194 2016-05-23 10:01:48 +08:00
medcl
cd9dfdf225 update es version to 2.3.2 2016-05-01 09:18:40 +08:00
medcl
2dfe76375a Merge branch 'DevFactory-release/use-logger-to-log-exceptions-fix-1' 2016-04-10 22:18:08 +08:00
medcl
ca2bfe5732 merge code 2016-04-10 22:17:59 +08:00
Medcl
3bd1490225 Merge pull request #177 from tokikanno/for-es.2.3.1
For ES 2.3.1
2016-04-07 23:00:29 +08:00
toki.kanno
c2244ccd80 revert installation command 2016-04-07 15:26:41 +08:00
toki.kanno
800097d776 new installation guide for ES2.X 2016-04-07 12:28:49 +08:00
toki.kanno
e14b7a0df7 update version info in pom.xml and README.md for ES 2.3.1 2016-04-07 12:23:54 +08:00
medcl
ffe9d9b8e7 update to es2.3.0, Closes #174 2016-04-02 22:40:47 +08:00
medcl
8640036645 update version to v1.8.1 2016-03-28 22:44:31 +08:00
medcl
b344f2e707 fix default analysis setting 2016-03-28 22:22:44 +08:00
Medcl
e673eca316 Merge pull request #144 from DevFactory/release/redundant-nullcheck-fix-1
Fixing redundant null check of value known to be non-null.
2016-03-27 21:12:43 +08:00
Medcl
c8f4d59f13 Merge pull request #145 from DevFactory/release/value-of-and-double-check-fix-1
Fixing the use of inefficient Number Constructor and Double check
2016-03-27 21:12:10 +08:00
medcl
d15e4e0629 Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2016-03-27 20:55:13 +08:00
medcl
0783d0145c merge and replace secure str 2016-03-27 20:54:34 +08:00
medcl
40f70d8ca2 update to 2.2.1 2016-03-27 10:12:00 +08:00
鲁严波
2df614c639 add travis ci for auto package 2016-03-25 16:12:26 +08:00
medcl
abc94db3d3 Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2016-03-02 23:53:12 +08:00
Medcl
8c50edf0e0 Merge pull request #150 from DevFactory/release/casting-math-operands-fix-1
Casting math operands before assignment
2016-02-29 10:23:09 +08:00
Medcl
28a1c8bdf9 Merge pull request #149 from DevFactory/release/make-public-static-constant-fix-1
public static" fields should be constant
2016-02-29 10:22:44 +08:00
medcl
53d1c6647b update version martix 2016-02-20 09:23:03 -08:00
medcl
099b1f296e update to v1.8.0, support es 2.2.0 2016-02-20 09:20:40 -08:00
ayman abdelghany
ba1823fb51 "public static" fields should be constant 2016-02-11 20:08:19 +02:00
ayman abdelghany
9768188eb2 Casting math operands before assignment 2016-02-11 20:04:45 +02:00
Ayman Abdel Ghany
5ec0bd5bd6 Use logger to log exceptions instead of printStackTrace(...) 2016-01-29 14:47:03 +02:00
Ayman Abdel Ghany
81ea266414 - replacing the inefficient Number constructor with static valueOf instead
- remove double-checked locking
2016-01-26 23:24:24 +02:00
Ayman Abdel Ghany
3d8fa5bee0 Fixing redundant nullcheck of value known to be non-null. 2016-01-26 22:23:30 +02:00
Medcl
3f0214a8e3 Merge pull request #140 from DevFactory/release/merge-if-with-enclosing-one-fix-1
Merging collapsible if statements increases the code's readability.
2016-01-26 09:43:41 +08:00
Medcl
afe9345ba5 Merge pull request #141 from DevFactory/release/use-log-instead-of-standard-output-fix-1
Replace the usage of System.out or System.err by a logger
2016-01-26 09:41:42 +08:00
Ayman Abdel Ghany
1eed772f34 Replace the usage of System.out or System.err by a logger 2016-01-21 23:03:45 +02:00
Ayman Abdel Ghany
5fb03d2751 Merging collapsible if statements increases the code's readability. 2016-01-21 19:13:41 +02:00
medcl
71b5211781 pretty logging 2016-01-13 10:53:05 +08:00
medcl
7bf3f97eaa update README 2016-01-10 10:39:42 +08:00
medcl
f9977456ee move config files stay with plugin 2016-01-10 10:33:07 +08:00
medcl
5b95ceb25a add some builtin words to dict file 2016-01-10 10:29:21 +08:00
medcl
5c325d5391 update es to 2.1.1 2015-12-25 16:12:44 +08:00
medcl
478aeba889 add ik url to README 2015-12-25 14:35:28 +08:00
medcl
680206d57f update README 2015-12-25 14:34:04 +08:00
medcl
2ea4c922de add deprecated annotation to unused class 2015-12-25 14:32:29 +08:00
medcl
c3c4b3c3a8 fix plugin name in plugin-descriptor.properties 2015-12-25 14:26:36 +08:00
medcl
33fb6ad67e fix null exception and update to support es2.1 2015-12-01 21:56:18 +08:00
Medcl
ce6424dd3f Merge pull request #116 from wyhw/upgrade2_1
Elasticsearch upgrade 2.1
2015-11-26 23:02:42 +08:00
wyhw
557d41f29a Elasticsearch upgrade 2.1 2015-11-26 16:23:47 +08:00
wyhw
961b3a3f55 Elasticsearch upgrade 2.1 2015-11-26 16:20:34 +08:00
medcl
ad883b6e79 add missing license file 2015-11-26 16:13:15 +08:00
medcl
07cdb4caf2 Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2015-11-26 16:02:54 +08:00
medcl
c77cfcb6ef update scope to GLOBAL 2015-11-26 16:02:31 +08:00
Medcl
155203b3e2 Merge pull request #111 from chuangbo/patch-1
Fix readme installation zip file name
2015-11-24 20:32:23 +08:00
Li Chuangbo
82e2628d77 Fix readme installation zip file name 2015-11-23 19:26:21 +13:00
Medcl
9035fa7e2b Merge pull request #105 from snow/master
FIX: url format in readme 
Thanks! @snow
2015-11-10 20:57:44 +08:00
Snow Helsing
6e9e3c2046 FIX: url format in readme 2015-11-10 17:15:10 +08:00
medcl
b8a8cb6ae2 fix README 2015-10-31 22:14:02 +08:00
medcl
90ff4e4efb delete test config file 2015-10-31 21:12:35 +08:00
medcl
f1e7ad645b delete useless config files 2015-10-31 21:09:47 +08:00
medcl
3d47fa6021 update to support es 2.0 2015-10-31 20:59:13 +08:00
medcl
a60059f8b1 update README,bump up version 2015-09-25 12:16:56 +08:00
Medcl
2b13e2a42e Merge pull request #87 from shikui/patch-1
处理 HTTP 304,不写日志
2015-08-12 10:43:58 +08:00
shikui
7cdb1773ec 处理 HTTP 304,不写日志 2015-08-07 22:57:03 +08:00
Medcl
e8e45dff05 Merge pull request #85 from shikui/master
完善对 use_smart 的说明;ETags 改为 ETag,与代码保持一致。
2015-08-07 16:44:07 +08:00
Medcl
b3165ceb36 Merge pull request #86 from starckgates/master
去掉304的处理
2015-08-07 16:39:34 +08:00
songliu
86813f5be9 去掉304的处理
304不应该和200做相同处理,否则就失去热词更新的意义了。
2015-08-07 15:10:27 +08:00
shikui
6dfda67200 Update README.md
标准的HTTP协议返回的是 ETag,而不是 ETags,代码中已经改为 ETag。
2015-08-07 11:43:00 +08:00
shikui
eabaaaff4f update README.md
对 use_smart 做详细说明。
2015-08-07 11:34:45 +08:00
shikui
664e2b96df Merge pull request #1 from medcl/master
merged from head fork
2015-08-07 11:02:33 +08:00
Medcl
44fdf68188 Merge pull request #82 from abookyun/update-readme
Update README.md
2015-08-07 08:02:20 +08:00
David Yun
36ab8b912a Restructure README.md 2015-08-06 11:31:27 +08:00
Medcl
497dfd95a9 Merge pull request #80 from abookyun/patch-1
Update and rename README.textile to README.md
2015-08-06 07:36:28 +08:00
David Yun
8f732ed346 Update and rename README.textile to README.md
It's more readable with markdown format 😎
2015-08-06 00:30:28 +08:00
Medcl
7dcffada95 Merge pull request #78 from shikui/master
1、将http 304作为正常状态处理;2、应为ETag,而不是ETags
2015-08-04 11:50:39 +08:00
shikui
f5aef261d3 update elasticsearch to 1.6.2 2015-08-03 17:50:44 +08:00
shikui
2cbd91c6c0 1、将http 304作为正常状态处理;2、应为ETag,而不是ETags
1、将http 304作为正常状态处理,避免大量304错误写到日志文件里;2、应为ETag,而不是ETags
2015-08-03 17:46:23 +08:00
medcl
2f367029e4 update to v1.4 2015-07-02 22:29:18 +08:00
51 changed files with 2099 additions and 2337 deletions

2
.github/FUNDING.yml vendored Normal file
View File

@ -0,0 +1,2 @@
patreon: medcl
custom: ["https://www.buymeacoffee.com/medcl"]

1
.gitignore vendored
View File

@ -7,3 +7,4 @@
.DS_Store
*.iml
\.*
!.travis.yml

9
.travis.yml Normal file
View File

@ -0,0 +1,9 @@
sudo: required
jdk:
- oraclejdk8
install: true
script:
- sudo apt-get update && sudo apt-get install oracle-java8-installer
- java -version
language: java
script: mvn clean package

202
LICENSE.txt Normal file
View File

@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

261
README.md Normal file
View File

@ -0,0 +1,261 @@
IK Analysis for Elasticsearch
=============================
The IK Analysis plugin integrates Lucene IK analyzer (http://code.google.com/p/ik-analyzer/) into elasticsearch, support customized dictionary.
Analyzer: `ik_smart` , `ik_max_word` , Tokenizer: `ik_smart` , `ik_max_word`
Versions
--------
IK version | ES version
-----------|-----------
master | 7.x -> master
6.x| 6.x
5.x| 5.x
1.10.6 | 2.4.6
1.9.5 | 2.3.5
1.8.1 | 2.2.1
1.7.0 | 2.1.1
1.5.0 | 2.0.0
1.2.6 | 1.0.0
1.2.5 | 0.90.x
1.1.3 | 0.20.x
1.0.0 | 0.16.2 -> 0.19.0
Install
-------
1.download or compile
* optional 1 - download pre-build package from here: https://github.com/medcl/elasticsearch-analysis-ik/releases
create plugin folder `cd your-es-root/plugins/ && mkdir ik`
unzip plugin to folder `your-es-root/plugins/ik`
* optional 2 - use elasticsearch-plugin to install ( supported from version v5.5.1 ):
```
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip
```
NOTE: replace `6.3.0` to your own elasticsearch version
2.restart elasticsearch
#### Quick Example
1.create a index
```bash
curl -XPUT http://localhost:9200/index
```
2.create a mapping
```bash
curl -XPOST http://localhost:9200/index/_mapping -H 'Content-Type:application/json' -d'
{
"properties": {
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
}
}
}'
```
3.index some docs
```bash
curl -XPOST http://localhost:9200/index/_create/1 -H 'Content-Type:application/json' -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
```
```bash
curl -XPOST http://localhost:9200/index/_create/2 -H 'Content-Type:application/json' -d'
{"content":"公安部:各地校车将享最高路权"}
'
```
```bash
curl -XPOST http://localhost:9200/index/_create/3 -H 'Content-Type:application/json' -d'
{"content":"中韩渔警冲突调查韩警平均每天扣1艘中国渔船"}
'
```
```bash
curl -XPOST http://localhost:9200/index/_create/4 -H 'Content-Type:application/json' -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'
```
4.query with highlighting
```bash
curl -XPOST http://localhost:9200/index/_search -H 'Content-Type:application/json' -d'
{
"query" : { "match" : { "content" : "中国" }},
"highlight" : {
"pre_tags" : ["<tag1>", "<tag2>"],
"post_tags" : ["</tag1>", "</tag2>"],
"fields" : {
"content" : {}
}
}
}
'
```
Result
```json
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2,
"hits": [
{
"_index": "index",
"_type": "fulltext",
"_id": "4",
"_score": 2,
"_source": {
"content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
},
"highlight": {
"content": [
"<tag1>中国</tag1>驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首 "
]
}
},
{
"_index": "index",
"_type": "fulltext",
"_id": "3",
"_score": 2,
"_source": {
"content": "中韩渔警冲突调查韩警平均每天扣1艘中国渔船"
},
"highlight": {
"content": [
"均每天扣1艘<tag1>中国</tag1>渔船 "
]
}
}
]
}
}
```
### Dictionary Configuration
`IKAnalyzer.cfg.xml` can be located at `{conf}/analysis-ik/config/IKAnalyzer.cfg.xml`
or `{plugins}/elasticsearch-analysis-ik-*/config/IKAnalyzer.cfg.xml`
```xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">location</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry>
</properties>
```
### 热更新 IK 分词使用方法
目前该插件支持热更新 IK 分词,通过上文在 IK 配置文件中提到的如下配置
```xml
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">location</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<entry key="remote_ext_stopwords">location</entry>
```
其中 `location` 是指一个 url比如 `http://yoursite.com/getCustomDict`,该请求只需满足以下两点即可完成分词热更新。
1. 该 http 请求需要返回两个头部(header),一个是 `Last-Modified`,一个是 `ETag`,这两者都是字符串类型,只要有一个发生变化,该插件就会去抓取新的分词进而更新词库。
2. 该 http 请求返回的内容格式是一行一个分词,换行符用 `\n` 即可。
满足上面两点要求就可以实现热更新分词了,不需要重启 ES 实例。
可以将需自动更新的热词放在一个 UTF-8 编码的 .txt 文件里,放在 nginx 或其他简易 http server 下,当 .txt 文件修改时http server 会在客户端请求该文件时自动返回相应的 Last-Modified 和 ETag。可以另外做一个工具来从业务系统提取相关词汇并更新这个 .txt 文件。
have fun.
常见问题
-------
1.自定义词典为什么没有生效?
请确保你的扩展词典的文本格式为 UTF8 编码
2.如何手动安装?
```bash
git clone https://github.com/medcl/elasticsearch-analysis-ik
cd elasticsearch-analysis-ik
git checkout tags/{version}
mvn clean
mvn compile
mvn package
```
拷贝和解压release下的文件: #{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-*.zip 到你的 elasticsearch 插件目录, 如: plugins/ik
重启elasticsearch
3.分词测试失败
请在某个索引下调用analyze接口测试,而不是直接调用analyze接口
如:
```bash
curl -XGET "http://localhost:9200/your_index/_analyze" -H 'Content-Type: application/json' -d'
{
"text":"中华人民共和国MN","tokenizer": "my_ik"
}'
```
4. ik_max_word 和 ik_smart 什么区别?
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合,适合 Term Query
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”,适合 Phrase 查询。
Changes
------
*自 v5.0.0 起*
- 移除名为 `ik` 的analyzer和tokenizer,请分别使用 `ik_smart``ik_max_word`
Thanks
------
YourKit supports IK Analysis for ElasticSearch project with its full-featured Java Profiler.
YourKit, LLC is the creator of innovative and intelligent tools for profiling
Java and .NET applications. Take a look at YourKit's leading software products:
<a href="http://www.yourkit.com/java/profiler/index.jsp">YourKit Java Profiler</a> and
<a href="http://www.yourkit.com/.net/profiler/index.jsp">YourKit .NET Profiler</a>.

View File

@ -1,258 +0,0 @@
IK Analysis for ElasticSearch
==================================
更新说明:
对于使用es集群用ik作为分词插件经常会修改自定义词典增加远程加载每次更新都会重新加载词典不必重启es服务。
The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.
Tokenizer: `ik`
Version
-------------
master | 1.5.0 -> master
1.3.0 | 1.5.0
1.2.9 | 1.4.0
1.2.8 | 1.3.2
1.2.7 | 1.2.1
1.2.6 | 1.0.0
1.2.5 | 0.90.2
1.2.3 | 0.90.2
1.2.0 | 0.90.0
1.1.3 | 0.20.2
1.1.2 | 0.19.x
1.0.0 | 0.16.2 -> 0.19.0
Thanks
-------------
YourKit supports IK Analysis for ElasticSearch project with its full-featured Java Profiler.
YourKit, LLC is the creator of innovative and intelligent tools for profiling
Java and .NET applications. Take a look at YourKit's leading software products:
<a href="http://www.yourkit.com/java/profiler/index.jsp">YourKit Java Profiler</a> and
<a href="http://www.yourkit.com/.net/profiler/index.jsp">YourKit .NET Profiler</a>.
Install
-------------
you can download this plugin from RTF project(https://github.com/medcl/elasticsearch-rtf)
https://github.com/medcl/elasticsearch-rtf/tree/master/plugins/analysis-ik
https://github.com/medcl/elasticsearch-rtf/tree/master/config/ik
<del>also remember to download the dict files,unzip these dict file into your elasticsearch's config folder,such as: your-es-root/config/ik</del>
you need a service restart after that!
Dict Configuration (es-root/config/ik/IKAnalyzer.cfg.xml)
-------------
https://github.com/medcl/elasticsearch-analysis-ik/blob/master/config/ik/IKAnalyzer.cfg.xml
<pre>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">location</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<entry key="remote_ext_stopwords">location</entry>
</properties>
</pre>
Analysis Configuration (elasticsearch.yml)
-------------
<Pre>
index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
ik_max_word:
type: ik
use_smart: false
ik_smart:
type: ik
use_smart: true
</pre>
Or
<pre>
index.analysis.analyzer.ik.type : "ik"
</pre>
you can set your prefer segment mode,default `use_smart` is false.
Mapping Configuration
-------------
Here is a quick example:
1.create a index
<pre>
curl -XPUT http://localhost:9200/index
</pre>
2.create a mapping
<pre>
curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
{
"fulltext": {
"_all": {
"indexAnalyzer": "ik",
"searchAnalyzer": "ik",
"term_vector": "no",
"store": "false"
},
"properties": {
"content": {
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"indexAnalyzer": "ik",
"searchAnalyzer": "ik",
"include_in_all": "true",
"boost": 8
}
}
}
}'
</pre>
3.index some docs
<pre>
curl -XPOST http://localhost:9200/index/fulltext/1 -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST http://localhost:9200/index/fulltext/2 -d'
{"content":"公安部:各地校车将享最高路权"}
'
curl -XPOST http://localhost:9200/index/fulltext/3 -d'
{"content":"中韩渔警冲突调查韩警平均每天扣1艘中国渔船"}
'
curl -XPOST http://localhost:9200/index/fulltext/4 -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'
</pre>
4.query with highlighting
<pre>
curl -XPOST http://localhost:9200/index/fulltext/_search -d'
{
"query" : { "term" : { "content" : "中国" }},
"highlight" : {
"pre_tags" : ["<tag1>", "<tag2>"],
"post_tags" : ["</tag1>", "</tag2>"],
"fields" : {
"content" : {}
}
}
}
'
</pre>
here is the query result
<pre>
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2,
"hits": [
{
"_index": "index",
"_type": "fulltext",
"_id": "4",
"_score": 2,
"_source": {
"content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
},
"highlight": {
"content": [
"<tag1>中国</tag1>驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首 "
]
}
},
{
"_index": "index",
"_type": "fulltext",
"_id": "3",
"_score": 2,
"_source": {
"content": "中韩渔警冲突调查韩警平均每天扣1艘中国渔船"
},
"highlight": {
"content": [
"均每天扣1艘<tag1>中国</tag1>渔船 "
]
}
}
]
}
}
</pre>
have fun.
热更新IK分词使用方法
----------
目前该插件支持热更新IK分词通过上文在ik配置文件中提到的如下配置
<pre>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">location</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<entry key="remote_ext_stopwords">location</entry>
</pre>
其中`location`是指一个url比如`http://yoursite.com/getCustomDict`,该请求只需满足一下两点即可完成分词热更新。
1. 该http请求需要返回两个头部一个是`Last-Modified`,一个是`ETags`,这两者都是字符串类型,只要有一个发生变化,该插件就会去抓取新的分词进而更新词库。
2. 该http请求返回的内容格式是一行一个分词换行符用`\n`即可。
满足上面两点要求就可以实现热更新分词了不需要重启es实例。
常见问题:
-------------
1.自定义词典为什么没有生效?
请确保你的扩展词典的文本格式为UTF8编码
2.如何手动安装,以 1.3.0 為例参考https://github.com/medcl/elasticsearch-analysis-ik/issues/46
`git clone https://github.com/medcl/elasticsearch-analysis-ik`
`cd elasticsearch-analysis-ik`
`mvn compile`
`mvn package`
`plugin --install analysis-ik --url file:///#{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-1.3.0.zip`

View File

@ -1,12 +1,12 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->

View File

@ -1,8 +0,0 @@
index:
analysis:
analyzer:
ik:
alias: [news_analyzer_ik,ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
index.analysis.analyzer.default.type : "ik"

View File

@ -1,2 +0,0 @@
medcl

View File

@ -1,44 +0,0 @@
rootLogger: INFO, console, file
logger:
# log action execution errors for easier debugging
action: DEBUG
# reduce the logging for aws, too much is logged under the default INFO
com.amazonaws: WARN
# gateway
#gateway: DEBUG
#index.gateway: DEBUG
# peer shard recovery
#indices.recovery: DEBUG
# discovery
#discovery: TRACE
index.search.slowlog: TRACE, index_search_slow_log_file
additivity:
index.search.slowlog: false
appender:
console:
type: console
layout:
type: consolePattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
file:
type: dailyRollingFile
file: ${path.logs}/${cluster.name}.log
datePattern: "'.'yyyy-MM-dd"
layout:
type: pattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
index_search_slow_log_file:
type: dailyRollingFile
file: ${path.logs}/${cluster.name}_index_search_slowlog.log
datePattern: "'.'yyyy-MM-dd"
layout:
type: pattern
conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"

475
licenses/lucene-LICENSE.txt Normal file
View File

@ -0,0 +1,475 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Some code in core/src/java/org/apache/lucene/util/UnicodeUtil.java was
derived from unicode conversion examples available at
http://www.unicode.org/Public/PROGRAMS/CVTUTF. Here is the copyright
from those sources:
/*
* Copyright 2001-2004 Unicode, Inc.
*
* Disclaimer
*
* This source code is provided as is by Unicode, Inc. No claims are
* made as to fitness for any particular purpose. No warranties of any
* kind are expressed or implied. The recipient agrees to determine
* applicability of information provided. If this file has been
* purchased on magnetic or optical media from Unicode, Inc., the
* sole remedy for any claim will be exchange of defective media
* within 90 days of receipt.
*
* Limitations on Rights to Redistribute This Code
*
* Unicode, Inc. hereby grants the right to freely use the information
* supplied in this file in the creation of products supporting the
* Unicode Standard, and to make copies of this file in any form
* for internal or external distribution as long as this notice
* remains attached.
*/
Some code in core/src/java/org/apache/lucene/util/ArrayUtil.java was
derived from Python 2.4.2 sources available at
http://www.python.org. Full license is here:
http://www.python.org/download/releases/2.4.2/license/
Some code in core/src/java/org/apache/lucene/util/UnicodeUtil.java was
derived from Python 3.1.2 sources available at
http://www.python.org. Full license is here:
http://www.python.org/download/releases/3.1.2/license/
Some code in core/src/java/org/apache/lucene/util/automaton was
derived from Brics automaton sources available at
www.brics.dk/automaton/. Here is the copyright from those sources:
/*
* Copyright (c) 2001-2009 Anders Moeller
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. The name of the author may not be used to endorse or promote products
* derived from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
* IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
The levenshtein automata tables in core/src/java/org/apache/lucene/util/automaton
were automatically generated with the moman/finenight FSA package.
Here is the copyright for those sources:
# Copyright (c) 2010, Jean-Philippe Barrette-LaPierre, <jpb@rrette.com>
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.
Some code in core/src/java/org/apache/lucene/util/UnicodeUtil.java was
derived from ICU (http://www.icu-project.org)
The full license is available here:
http://source.icu-project.org/repos/icu/icu/trunk/license.html
/*
* Copyright (C) 1999-2010, International Business Machines
* Corporation and others. All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, and/or sell copies of the
* Software, and to permit persons to whom the Software is furnished to do so,
* provided that the above copyright notice(s) and this permission notice appear
* in all copies of the Software and that both the above copyright notice(s) and
* this permission notice appear in supporting documentation.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS.
* IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE
* LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR
* ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
* IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
* OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*
* Except as contained in this notice, the name of a copyright holder shall not
* be used in advertising or otherwise to promote the sale, use or other
* dealings in this Software without prior written authorization of the
* copyright holder.
*/
The following license applies to the Snowball stemmers:
Copyright (c) 2001, Dr Martin Porter
Copyright (c) 2002, Richard Boulton
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holders nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The following license applies to the KStemmer:
Copyright © 2003,
Center for Intelligent Information Retrieval,
University of Massachusetts, Amherst.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. The names "Center for Intelligent Information Retrieval" and
"University of Massachusetts" must not be used to endorse or promote products
derived from this software without prior written permission. To obtain
permission, contact info@ciir.cs.umass.edu.
THIS SOFTWARE IS PROVIDED BY UNIVERSITY OF MASSACHUSETTS AND OTHER CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
The following license applies to the Morfologik project:
Copyright (c) 2006 Dawid Weiss
Copyright (c) 2007-2011 Dawid Weiss, Marcin Miłkowski
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of Morfologik nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
---
The dictionary comes from Morfologik project. Morfologik uses data from
Polish ispell/myspell dictionary hosted at http://www.sjp.pl/slownik/en/ and
is licenced on the terms of (inter alia) LGPL and Creative Commons
ShareAlike. The part-of-speech tags were added in Morfologik project and
are not found in the data from sjp.pl. The tagset is similar to IPI PAN
tagset.
---
The following license applies to the Morfeusz project,
used by org.apache.lucene.analysis.morfologik.
BSD-licensed dictionary of Polish (SGJP)
http://sgjp.pl/morfeusz/
Copyright © 2011 Zygmunt Saloni, Włodzimierz Gruszczyński,
Marcin Woliński, Robert Wołosz
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the
distribution.
THIS SOFTWARE IS PROVIDED BY COPYRIGHT HOLDERS “AS IS” AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDERS OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

191
licenses/lucene-NOTICE.txt Normal file
View File

@ -0,0 +1,191 @@
Apache Lucene
Copyright 2014 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
Includes software from other Apache Software Foundation projects,
including, but not limited to:
- Apache Ant
- Apache Jakarta Regexp
- Apache Commons
- Apache Xerces
ICU4J, (under analysis/icu) is licensed under an MIT styles license
and Copyright (c) 1995-2008 International Business Machines Corporation and others
Some data files (under analysis/icu/src/data) are derived from Unicode data such
as the Unicode Character Database. See http://unicode.org/copyright.html for more
details.
Brics Automaton (under core/src/java/org/apache/lucene/util/automaton) is
BSD-licensed, created by Anders Møller. See http://www.brics.dk/automaton/
The levenshtein automata tables (under core/src/java/org/apache/lucene/util/automaton) were
automatically generated with the moman/finenight FSA library, created by
Jean-Philippe Barrette-LaPierre. This library is available under an MIT license,
see http://sites.google.com/site/rrettesite/moman and
http://bitbucket.org/jpbarrette/moman/overview/
The class org.apache.lucene.util.WeakIdentityMap was derived from
the Apache CXF project and is Apache License 2.0.
The Google Code Prettify is Apache License 2.0.
See http://code.google.com/p/google-code-prettify/
JUnit (junit-4.10) is licensed under the Common Public License v. 1.0
See http://junit.sourceforge.net/cpl-v10.html
This product includes code (JaspellTernarySearchTrie) from Java Spelling Checkin
g Package (jaspell): http://jaspell.sourceforge.net/
License: The BSD License (http://www.opensource.org/licenses/bsd-license.php)
The snowball stemmers in
analysis/common/src/java/net/sf/snowball
were developed by Martin Porter and Richard Boulton.
The snowball stopword lists in
analysis/common/src/resources/org/apache/lucene/analysis/snowball
were developed by Martin Porter and Richard Boulton.
The full snowball package is available from
http://snowball.tartarus.org/
The KStem stemmer in
analysis/common/src/org/apache/lucene/analysis/en
was developed by Bob Krovetz and Sergio Guzman-Lara (CIIR-UMass Amherst)
under the BSD-license.
The Arabic,Persian,Romanian,Bulgarian, and Hindi analyzers (common) come with a default
stopword list that is BSD-licensed created by Jacques Savoy. These files reside in:
analysis/common/src/resources/org/apache/lucene/analysis/ar/stopwords.txt,
analysis/common/src/resources/org/apache/lucene/analysis/fa/stopwords.txt,
analysis/common/src/resources/org/apache/lucene/analysis/ro/stopwords.txt,
analysis/common/src/resources/org/apache/lucene/analysis/bg/stopwords.txt,
analysis/common/src/resources/org/apache/lucene/analysis/hi/stopwords.txt
See http://members.unine.ch/jacques.savoy/clef/index.html.
The German,Spanish,Finnish,French,Hungarian,Italian,Portuguese,Russian and Swedish light stemmers
(common) are based on BSD-licensed reference implementations created by Jacques Savoy and
Ljiljana Dolamic. These files reside in:
analysis/common/src/java/org/apache/lucene/analysis/de/GermanLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/de/GermanMinimalStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/fi/FinnishLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/hu/HungarianLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/pt/PortugueseLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/ru/RussianLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/sv/SwedishLightStemmer.java
The Stempel analyzer (stempel) includes BSD-licensed software developed
by the Egothor project http://egothor.sf.net/, created by Leo Galambos, Martin Kvapil,
and Edmond Nolan.
The Polish analyzer (stempel) comes with a default
stopword list that is BSD-licensed created by the Carrot2 project. The file resides
in stempel/src/resources/org/apache/lucene/analysis/pl/stopwords.txt.
See http://project.carrot2.org/license.html.
The SmartChineseAnalyzer source code (smartcn) was
provided by Xiaoping Gao and copyright 2009 by www.imdict.net.
WordBreakTestUnicode_*.java (under modules/analysis/common/src/test/)
is derived from Unicode data such as the Unicode Character Database.
See http://unicode.org/copyright.html for more details.
The Morfologik analyzer (morfologik) includes BSD-licensed software
developed by Dawid Weiss and Marcin Miłkowski (http://morfologik.blogspot.com/).
Morfologik uses data from Polish ispell/myspell dictionary
(http://www.sjp.pl/slownik/en/) licenced on the terms of (inter alia)
LGPL and Creative Commons ShareAlike.
Morfologic includes data from BSD-licensed dictionary of Polish (SGJP)
(http://sgjp.pl/morfeusz/)
Servlet-api.jar and javax.servlet-*.jar are under the CDDL license, the original
source code for this can be found at http://www.eclipse.org/jetty/downloads.php
===========================================================================
Kuromoji Japanese Morphological Analyzer - Apache Lucene Integration
===========================================================================
This software includes a binary and/or source version of data from
mecab-ipadic-2.7.0-20070801
which can be obtained from
http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz
or
http://jaist.dl.sourceforge.net/project/mecab/mecab-ipadic/2.7.0-20070801/mecab-ipadic-2.7.0-20070801.tar.gz
===========================================================================
mecab-ipadic-2.7.0-20070801 Notice
===========================================================================
Nara Institute of Science and Technology (NAIST),
the copyright holders, disclaims all warranties with regard to this
software, including all implied warranties of merchantability and
fitness, in no event shall NAIST be liable for
any special, indirect or consequential damages or any damages
whatsoever resulting from loss of use, data or profits, whether in an
action of contract, negligence or other tortuous action, arising out
of or in connection with the use or performance of this software.
A large portion of the dictionary entries
originate from ICOT Free Software. The following conditions for ICOT
Free Software applies to the current dictionary as well.
Each User may also freely distribute the Program, whether in its
original form or modified, to any third party or parties, PROVIDED
that the provisions of Section 3 ("NO WARRANTY") will ALWAYS appear
on, or be attached to, the Program, which is distributed substantially
in the same form as set out herein and that such intended
distribution, if actually made, will neither violate or otherwise
contravene any of the laws and regulations of the countries having
jurisdiction over the User or the intended distribution itself.
NO WARRANTY
The program was produced on an experimental basis in the course of the
research and development conducted during the project and is provided
to users as so produced on an experimental basis. Accordingly, the
program is provided without any warranty whatsoever, whether express,
implied, statutory or otherwise. The term "warranty" used herein
includes, but is not limited to, any warranty of the quality,
performance, merchantability and fitness for a particular purpose of
the program and the nonexistence of any infringement or violation of
any right of any third party.
Each user of the program will agree and understand, and be deemed to
have agreed and understood, that there is no warranty whatsoever for
the program and, accordingly, the entire risk arising from or
otherwise connected with the program is assumed by the user.
Therefore, neither ICOT, the copyright holder, or any other
organization that participated in or was otherwise related to the
development of the program and their respective officials, directors,
officers and other employees shall be held liable for any and all
damages, including, without limitation, general, special, incidental
and consequential damages, arising out of or otherwise in connection
with the use or inability to use the program or any product, material
or result produced or otherwise obtained by using the program,
regardless of whether they have been advised of, or otherwise had
knowledge of, the possibility of such damages at any time during the
project or thereafter. Each user will be deemed to have agreed to the
foregoing by his or her commencement of use of the program. The term
"use" as used herein includes, but is not limited to, the use,
modification, copying and distribution of the program and the
production of secondary products from the program.
In the case where the program, whether in its original form or
modified, was distributed or delivered to or received by a user from
any person, organization or entity other than ICOT, unless it makes or
grants independently of ICOT any specific warranty to the user in
writing, such person, organization or entity, will also be exempted
from and not be held liable to the user for any such damages as noted
above as far as the program is concerned.

180
pom.xml Normal file → Executable file
View File

@ -6,10 +6,24 @@
<modelVersion>4.0.0</modelVersion>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-analysis-ik</artifactId>
<version>1.3.0</version>
<version>${elasticsearch.version}</version>
<packaging>jar</packaging>
<description>IK Analyzer for ElasticSearch</description>
<inceptionYear>2009</inceptionYear>
<description>IK Analyzer for Elasticsearch</description>
<inceptionYear>2011</inceptionYear>
<properties>
<elasticsearch.version>8.4.1</elasticsearch.version>
<maven.compiler.target>1.8</maven.compiler.target>
<elasticsearch.assembly.descriptor>${project.basedir}/src/main/assemblies/plugin.xml</elasticsearch.assembly.descriptor>
<elasticsearch.plugin.name>analysis-ik</elasticsearch.plugin.name>
<elasticsearch.plugin.classname>org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin</elasticsearch.plugin.classname>
<elasticsearch.plugin.jvm>true</elasticsearch.plugin.jvm>
<tests.rest.load_packaged>false</tests.rest.load_packaged>
<skip.unit.tests>true</skip.unit.tests>
<gpg.keyname>4E899B30</gpg.keyname>
<gpg.useagent>true</gpg.useagent>
</properties>
<licenses>
<license>
<name>The Apache Software License, Version 2.0</name>
@ -17,6 +31,16 @@
<distribution>repo</distribution>
</license>
</licenses>
<developers>
<developer>
<name>INFINI Labs</name>
<email>hello@infini.ltd</email>
<organization>INFINI Labs</organization>
<organizationUrl>https://infinilabs.com</organizationUrl>
</developer>
</developers>
<scm>
<connection>scm:git:git@github.com:medcl/elasticsearch-analysis-ik.git</connection>
<developerConnection>scm:git:git@github.com:medcl/elasticsearch-analysis-ik.git
@ -27,20 +51,27 @@
<parent>
<groupId>org.sonatype.oss</groupId>
<artifactId>oss-parent</artifactId>
<version>7</version>
<version>9</version>
</parent>
<properties>
<elasticsearch.version>1.5.0</elasticsearch.version>
</properties>
<distributionManagement>
<snapshotRepository>
<id>oss.sonatype.org</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
</snapshotRepository>
<repository>
<id>oss.sonatype.org</id>
<url>https://oss.sonatype.org/service/local/staging/deploy/maven2/</url>
</repository>
</distributionManagement>
<repositories>
<repositories>
<repository>
<id>oss.sonatype.org</id>
<name>OSS Sonatype</name>
<releases><enabled>true</enabled></releases>
<snapshots><enabled>true</enabled></snapshots>
<url>http://oss.sonatype.org/content/repositories/releases/</url>
<url>https://oss.sonatype.org/content/repositories/releases/</url>
</repository>
</repositories>
@ -51,44 +82,39 @@
<version>${elasticsearch.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.4.1</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.16</version>
<scope>runtime</scope>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.18.0</version>
</dependency>
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-core</artifactId>
<version>1.3.RC2</version>
<version>1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-library</artifactId>
<version>1.3.RC2</version>
<version>1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.10</version>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>4.10.4</version>
</dependency>
</dependencies>
<build>
@ -96,10 +122,10 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<version>3.5.1</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
<source>${maven.compiler.target}</source>
<target>${maven.compiler.target}</target>
</configuration>
</plugin>
<plugin>
@ -127,7 +153,9 @@
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<outputDirectory>${project.build.directory}/releases/</outputDirectory>
<descriptors>
<descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor>
@ -137,9 +165,6 @@
<mainClass>fully.qualified.MainClass</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
@ -152,4 +177,93 @@
</plugin>
</plugins>
</build>
<profiles>
<profile>
<id>disable-java8-doclint</id>
<activation>
<jdk>[1.8,)</jdk>
</activation>
<properties>
<additionalparam>-Xdoclint:none</additionalparam>
</properties>
</profile>
<profile>
<id>release</id>
<build>
<plugins>
<plugin>
<groupId>org.sonatype.plugins</groupId>
<artifactId>nexus-staging-maven-plugin</artifactId>
<version>1.6.3</version>
<extensions>true</extensions>
<configuration>
<serverId>oss</serverId>
<nexusUrl>https://oss.sonatype.org/</nexusUrl>
<autoReleaseAfterClose>true</autoReleaseAfterClose>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-release-plugin</artifactId>
<version>2.1</version>
<configuration>
<autoVersionSubmodules>true</autoVersionSubmodules>
<useReleaseProfile>false</useReleaseProfile>
<releaseProfiles>release</releaseProfiles>
<goals>deploy</goals>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>${maven.compiler.target}</source>
<target>${maven.compiler.target}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-gpg-plugin</artifactId>
<version>1.5</version>
<executions>
<execution>
<id>sign-artifacts</id>
<phase>verify</phase>
<goals>
<goal>sign</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.2.1</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.9</version>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>

View File

@ -1,13 +1,32 @@
<?xml version="1.0"?>
<assembly>
<id></id>
<id>analysis-ik-release</id>
<formats>
<format>zip</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<fileSets>
<fileSet>
<directory>${project.basedir}/config</directory>
<outputDirectory>config</outputDirectory>
</fileSet>
</fileSets>
<files>
<file>
<source>${project.basedir}/src/main/resources/plugin-descriptor.properties</source>
<outputDirectory/>
<filtered>true</filtered>
</file>
<file>
<source>${project.basedir}/src/main/resources/plugin-security.policy</source>
<outputDirectory/>
<filtered>true</filtered>
</file>
</files>
<dependencySets>
<dependencySet>
<outputDirectory>/</outputDirectory>
<outputDirectory/>
<useProjectArtifact>true</useProjectArtifact>
<useTransitiveFiltering>true</useTransitiveFiltering>
<excludes>
@ -15,7 +34,7 @@
</excludes>
</dependencySet>
<dependencySet>
<outputDirectory>/</outputDirectory>
<outputDirectory/>
<useProjectArtifact>true</useProjectArtifact>
<useTransitiveFiltering>true</useTransitiveFiltering>
<includes>
@ -23,4 +42,4 @@
</includes>
</dependencySet>
</dependencySets>
</assembly>
</assembly>

View File

@ -1,22 +0,0 @@
package org.elasticsearch.index.analysis;
public class IkAnalysisBinderProcessor extends AnalysisModule.AnalysisBinderProcessor {
@Override public void processTokenFilters(TokenFiltersBindings tokenFiltersBindings) {
}
@Override public void processAnalyzers(AnalyzersBindings analyzersBindings) {
analyzersBindings.processAnalyzer("ik", IkAnalyzerProvider.class);
super.processAnalyzers(analyzersBindings);
}
@Override
public void processTokenizers(TokenizersBindings tokenizersBindings) {
tokenizersBindings.processTokenizer("ik", IkTokenizerFactory.class);
super.processTokenizers(tokenizersBindings);
}
}

View File

@ -1,23 +1,28 @@
package org.elasticsearch.index.analysis;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.Index;
import org.elasticsearch.index.settings.IndexSettings;
import org.elasticsearch.index.IndexSettings;
import org.wltea.analyzer.cfg.Configuration;
import org.wltea.analyzer.dic.Dictionary;
import org.wltea.analyzer.lucene.IKAnalyzer;
public class IkAnalyzerProvider extends AbstractIndexAnalyzerProvider<IKAnalyzer> {
private final IKAnalyzer analyzer;
@Inject
public IkAnalyzerProvider(Index index, @IndexSettings Settings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) {
super(index, indexSettings, name, settings);
Dictionary.initial(new Configuration(env));
analyzer=new IKAnalyzer(indexSettings, settings, env);
public IkAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings,boolean useSmart) {
super(name, settings);
Configuration configuration=new Configuration(env,settings).setUseSmart(useSmart);
analyzer=new IKAnalyzer(configuration);
}
public static IkAnalyzerProvider getIkSmartAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) {
return new IkAnalyzerProvider(indexSettings,env,name,settings,true);
}
public static IkAnalyzerProvider getIkAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) {
return new IkAnalyzerProvider(indexSettings,env,name,settings,false);
}
@Override public IKAnalyzer get() {

View File

@ -1,33 +1,34 @@
package org.elasticsearch.index.analysis;
import org.apache.lucene.analysis.Tokenizer;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.Index;
import org.elasticsearch.index.settings.IndexSettings;
import org.elasticsearch.index.IndexSettings;
import org.wltea.analyzer.cfg.Configuration;
import org.wltea.analyzer.dic.Dictionary;
import org.wltea.analyzer.lucene.IKTokenizer;
import java.io.Reader;
public class IkTokenizerFactory extends AbstractTokenizerFactory {
private Environment environment;
private Settings settings;
private Configuration configuration;
@Inject
public IkTokenizerFactory(Index index, @IndexSettings Settings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) {
super(index, indexSettings, name, settings);
this.environment = env;
this.settings = settings;
Dictionary.initial(new Configuration(env));
public IkTokenizerFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
super(indexSettings, settings,name);
configuration=new Configuration(env,settings);
}
public static IkTokenizerFactory getIkTokenizerFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
return new IkTokenizerFactory(indexSettings,env, name, settings).setSmart(false);
}
public static IkTokenizerFactory getIkSmartTokenizerFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
return new IkTokenizerFactory(indexSettings,env, name, settings).setSmart(true);
}
public IkTokenizerFactory setSmart(boolean smart){
this.configuration.setUseSmart(smart);
return this;
}
@Override
public Tokenizer create(Reader reader) {
return new IKTokenizer(reader, settings, environment);
}
public Tokenizer create() {
return new IKTokenizer(configuration); }
}

View File

@ -1,27 +1,41 @@
package org.elasticsearch.plugin.analysis.ik;
import org.elasticsearch.common.inject.Module;
import org.elasticsearch.index.analysis.AnalysisModule;
import org.elasticsearch.index.analysis.IkAnalysisBinderProcessor;
import org.elasticsearch.plugins.AbstractPlugin;
import org.apache.lucene.analysis.Analyzer;
import org.elasticsearch.index.analysis.AnalyzerProvider;
import org.elasticsearch.index.analysis.IkAnalyzerProvider;
import org.elasticsearch.index.analysis.IkTokenizerFactory;
import org.elasticsearch.index.analysis.TokenizerFactory;
import org.elasticsearch.indices.analysis.AnalysisModule;
import org.elasticsearch.plugins.AnalysisPlugin;
import org.elasticsearch.plugins.Plugin;
import java.util.HashMap;
import java.util.Map;
public class AnalysisIkPlugin extends AbstractPlugin {
public class AnalysisIkPlugin extends Plugin implements AnalysisPlugin {
@Override public String name() {
return "analysis-ik";
public static String PLUGIN_NAME = "analysis-ik";
@Override
public Map<String, AnalysisModule.AnalysisProvider<TokenizerFactory>> getTokenizers() {
Map<String, AnalysisModule.AnalysisProvider<TokenizerFactory>> extra = new HashMap<>();
extra.put("ik_smart", IkTokenizerFactory::getIkSmartTokenizerFactory);
extra.put("ik_max_word", IkTokenizerFactory::getIkTokenizerFactory);
return extra;
}
@Override
public Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends Analyzer>>> getAnalyzers() {
Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends Analyzer>>> extra = new HashMap<>();
@Override public String description() {
return "ik analysis";
extra.put("ik_smart", IkAnalyzerProvider::getIkSmartAnalyzerProvider);
extra.put("ik_max_word", IkAnalyzerProvider::getIkAnalyzerProvider);
return extra;
}
@Override public void processModule(Module module) {
if (module instanceof AnalysisModule) {
AnalysisModule analysisModule = (AnalysisModule) module;
analysisModule.addProcessor(new IkAnalysisBinderProcessor());
}
}
}

165
src/main/java/org/wltea/analyzer/cfg/Configuration.java Normal file → Executable file
View File

@ -3,126 +3,73 @@
*/
package org.wltea.analyzer.cfg;
import java.io.*;
import java.util.ArrayList;
import java.util.InvalidPropertiesFormatException;
import java.util.List;
import java.util.Properties;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.Loggers;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.core.PathUtils;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin;
import org.wltea.analyzer.dic.Dictionary;
import java.io.File;
import java.nio.file.Path;
public class Configuration {
private static String FILE_NAME = "ik/IKAnalyzer.cfg.xml";
private static final String EXT_DICT = "ext_dict";
private static final String REMOTE_EXT_DICT = "remote_ext_dict";
private static final String EXT_STOP = "ext_stopwords";
private static final String REMOTE_EXT_STOP = "remote_ext_stopwords";
private static ESLogger logger = null;
private Properties props;
private Environment environment;
private Environment environment;
private Settings settings;
public Configuration(Environment env){
logger = Loggers.getLogger("ik-analyzer");
props = new Properties();
environment = env;
//是否启用智能分词
private boolean useSmart;
File fileConfig= new File(environment.configFile(), FILE_NAME);
//是否启用远程词典加载
private boolean enableRemoteDict=false;
//是否启用小写处理
private boolean enableLowercase=true;
@Inject
public Configuration(Environment env,Settings settings) {
this.environment = env;
this.settings=settings;
this.useSmart = settings.get("use_smart", "false").equals("true");
this.enableLowercase = settings.get("enable_lowercase", "true").equals("true");
this.enableRemoteDict = settings.get("enable_remote_dict", "true").equals("true");
Dictionary.initial(this);
InputStream input = null;
try {
input = new FileInputStream(fileConfig);
} catch (FileNotFoundException e) {
logger.error("ik-analyzer",e);
}
if(input != null){
try {
props.loadFromXML(input);
} catch (InvalidPropertiesFormatException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public List<String> getExtDictionarys(){
List<String> extDictFiles = new ArrayList<String>(2);
String extDictCfg = props.getProperty(EXT_DICT);
if(extDictCfg != null){
String[] filePaths = extDictCfg.split(";");
if(filePaths != null){
for(String filePath : filePaths){
if(filePath != null && !"".equals(filePath.trim())){
File file=new File("ik",filePath.trim());
extDictFiles.add(file.toString());
}
}
}
}
return extDictFiles;
}
public List<String> getRemoteExtDictionarys(){
List<String> remoteExtDictFiles = new ArrayList<String>(2);
String remoteExtDictCfg = props.getProperty(REMOTE_EXT_DICT);
if(remoteExtDictCfg != null){
String[] filePaths = remoteExtDictCfg.split(";");
if(filePaths != null){
for(String filePath : filePaths){
if(filePath != null && !"".equals(filePath.trim())){
remoteExtDictFiles.add(filePath);
}
}
}
}
return remoteExtDictFiles;
public Path getConfigInPluginDir() {
return PathUtils
.get(new File(AnalysisIkPlugin.class.getProtectionDomain().getCodeSource().getLocation().getPath())
.getParent(), "config")
.toAbsolutePath();
}
public List<String> getExtStopWordDictionarys(){
List<String> extStopWordDictFiles = new ArrayList<String>(2);
String extStopWordDictCfg = props.getProperty(EXT_STOP);
if(extStopWordDictCfg != null){
String[] filePaths = extStopWordDictCfg.split(";");
if(filePaths != null){
for(String filePath : filePaths){
if(filePath != null && !"".equals(filePath.trim())){
File file=new File("ik",filePath.trim());
extStopWordDictFiles.add(file.toString());
}
}
}
}
return extStopWordDictFiles;
}
public List<String> getRemoteExtStopWordDictionarys(){
List<String> remoteExtStopWordDictFiles = new ArrayList<String>(2);
String remoteExtStopWordDictCfg = props.getProperty(REMOTE_EXT_STOP);
if(remoteExtStopWordDictCfg != null){
String[] filePaths = remoteExtStopWordDictCfg.split(";");
if(filePaths != null){
for(String filePath : filePaths){
if(filePath != null && !"".equals(filePath.trim())){
remoteExtStopWordDictFiles.add(filePath);
}
}
}
}
return remoteExtStopWordDictFiles;
public boolean isUseSmart() {
return useSmart;
}
public File getDictRoot() {
return environment.configFile();
}
public Configuration setUseSmart(boolean useSmart) {
this.useSmart = useSmart;
return this;
}
public Environment getEnvironment() {
return environment;
}
public Settings getSettings() {
return settings;
}
public boolean isEnableRemoteDict() {
return enableRemoteDict;
}
public boolean isEnableLowercase() {
return enableLowercase;
}
}

View File

@ -32,6 +32,7 @@ import java.util.LinkedList;
import java.util.Map;
import java.util.Set;
import org.wltea.analyzer.cfg.Configuration;
import org.wltea.analyzer.dic.Dictionary;
/**
@ -47,7 +48,7 @@ class AnalyzeContext {
private static final int BUFF_EXHAUST_CRITICAL = 100;
//字符读取缓冲
//字符读取缓冲
private char[] segmentBuff;
//字符类型数组
private int[] charTypes;
@ -72,12 +73,11 @@ class AnalyzeContext {
private Map<Integer , LexemePath> pathMap;
//最终分词结果集
private LinkedList<Lexeme> results;
private boolean useSmart;
//分词器配置项
// private Configuration cfg;
private Configuration cfg;
public AnalyzeContext(boolean useSmart){
this.useSmart = useSmart;
public AnalyzeContext(Configuration configuration){
this.cfg = configuration;
this.segmentBuff = new char[BUFF_SIZE];
this.charTypes = new int[BUFF_SIZE];
this.buffLocker = new HashSet<String>();
@ -139,7 +139,7 @@ class AnalyzeContext {
*/
void initCursor(){
this.cursor = 0;
this.segmentBuff[this.cursor] = CharacterUtil.regularize(this.segmentBuff[this.cursor]);
this.segmentBuff[this.cursor] = CharacterUtil.regularize(this.segmentBuff[this.cursor],cfg.isEnableLowercase());
this.charTypes[this.cursor] = CharacterUtil.identifyCharType(this.segmentBuff[this.cursor]);
}
@ -151,7 +151,7 @@ class AnalyzeContext {
boolean moveCursor(){
if(this.cursor < this.available - 1){
this.cursor++;
this.segmentBuff[this.cursor] = CharacterUtil.regularize(this.segmentBuff[this.cursor]);
this.segmentBuff[this.cursor] = CharacterUtil.regularize(this.segmentBuff[this.cursor],cfg.isEnableLowercase());
this.charTypes[this.cursor] = CharacterUtil.identifyCharType(this.segmentBuff[this.cursor]);
return true;
}else{
@ -267,6 +267,15 @@ class AnalyzeContext {
Lexeme l = path.pollFirst();
while(l != null){
this.results.add(l);
//字典中无单字但是词元冲突了切分出相交词元的前一个词元中的单字
/*int innerIndex = index + 1;
for (; innerIndex < index + l.getLength(); innerIndex++) {
Lexeme innerL = path.peekFirst();
if (innerL != null && innerIndex == innerL.getBegin()) {
this.outputSingleCJK(innerIndex - 1);
}
}*/
//将index移至lexeme后
index = l.getBegin() + l.getLength();
l = path.pollFirst();
@ -345,7 +354,7 @@ class AnalyzeContext {
*/
private void compound(Lexeme result){
if(!this.useSmart){
if(!this.cfg.isUseSmart()){
return ;
}
//数量词合并处理

View File

@ -127,14 +127,12 @@ class CN_QuantifierSegmenter implements ISegmenter{
}
//缓冲区已经用完还有尚未输出的数词
if(context.isBufferConsumed()){
if(nStart != -1 && nEnd != -1){
//输出数词
outputNumLexeme(context);
//重置头尾指针
nStart = -1;
nEnd = -1;
}
if(context.isBufferConsumed() && (nStart != -1 && nEnd != -1)){
//输出数词
outputNumLexeme(context);
//重置头尾指针
nStart = -1;
nEnd = -1;
}
}
@ -216,10 +214,9 @@ class CN_QuantifierSegmenter implements ISegmenter{
//找到一个相邻的数词
if(!context.getOrgLexemes().isEmpty()){
Lexeme l = context.getOrgLexemes().peekLast();
if(Lexeme.TYPE_CNUM == l.getLexemeType() || Lexeme.TYPE_ARABIC == l.getLexemeType()){
if(l.getBegin() + l.getLength() == context.getCursor()){
return true;
}
if((Lexeme.TYPE_CNUM == l.getLexemeType() || Lexeme.TYPE_ARABIC == l.getLexemeType())
&& (l.getBegin() + l.getLength() == context.getCursor())){
return true;
}
}
}

View File

@ -86,14 +86,14 @@ class CharacterUtil {
* @param input
* @return char
*/
static char regularize(char input){
static char regularize(char input,boolean lowercase){
if (input == 12288) {
input = (char) 32;
}else if (input > 65280 && input < 65375) {
input = (char) (input - 65248);
}else if (input >= 'A' && input <= 'Z') {
}else if (input >= 'A' && input <= 'Z' && lowercase) {
input += 32;
}

View File

@ -23,10 +23,7 @@
*/
package org.wltea.analyzer.core;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.wltea.analyzer.cfg.Configuration;
import org.wltea.analyzer.dic.Dictionary;
import java.io.IOException;
import java.io.Reader;
@ -41,52 +38,32 @@ public final class IKSegmenter {
//字符窜reader
private Reader input;
//分词器配置项
private Configuration cfg;
//分词器上下文
private AnalyzeContext context;
//分词处理器列表
private List<ISegmenter> segmenters;
//分词歧义裁决器
private IKArbitrator arbitrator;
private boolean useSmart = false;
private Configuration configuration;
/**
* IK分词器构造函数
* @param input
*/
public IKSegmenter(Reader input , Settings settings, Environment environment){
public IKSegmenter(Reader input ,Configuration configuration){
this.input = input;
this.cfg = new Configuration(environment);
this.useSmart = settings.get("use_smart", "false").equals("true");
this.configuration = configuration;
this.init();
}
public IKSegmenter(Reader input){
new IKSegmenter(input, null,null);
}
// /**
// * IK分词器构造函数
// * @param input
// * @param cfg 使用自定义的Configuration构造分词器
// *
// */
// public IKSegmenter(Reader input , Configuration cfg){
// this.input = input;
// this.cfg = cfg;
// this.init();
// }
/**
* 初始化
*/
private void init(){
//初始化词典单例
Dictionary.initial(this.cfg);
//初始化分词上下文
this.context = new AnalyzeContext(useSmart);
this.context = new AnalyzeContext(configuration);
//加载子分词器
this.segmenters = this.loadSegmenters();
//加载歧义裁决器
@ -147,7 +124,7 @@ public final class IKSegmenter {
}
}
//对分词进行歧义处理
this.arbitrator.process(context, useSmart);
this.arbitrator.process(context, configuration.isUseSmart());
//将分词结果输出到结果集并处理未切分的单个CJK字符
context.outputToResult();
//记录本次分词的缓冲区位移

View File

@ -155,14 +155,12 @@ class LetterSegmenter implements ISegmenter {
}
//判断缓冲区是否已经读完
if(context.isBufferConsumed()){
if(this.start != -1 && this.end != -1){
//缓冲以读完输出词元
Lexeme newLexeme = new Lexeme(context.getBufferOffset() , this.start , this.end - this.start + 1 , Lexeme.TYPE_LETTER);
context.addLexeme(newLexeme);
this.start = -1;
this.end = -1;
}
if(context.isBufferConsumed() && (this.start != -1 && this.end != -1)){
//缓冲以读完输出词元
Lexeme newLexeme = new Lexeme(context.getBufferOffset() , this.start , this.end - this.start + 1 , Lexeme.TYPE_LETTER);
context.addLexeme(newLexeme);
this.start = -1;
this.end = -1;
}
//判断是否锁定缓冲区
@ -203,14 +201,12 @@ class LetterSegmenter implements ISegmenter {
}
//判断缓冲区是否已经读完
if(context.isBufferConsumed()){
if(this.englishStart != -1 && this.englishEnd != -1){
//缓冲以读完输出词元
Lexeme newLexeme = new Lexeme(context.getBufferOffset() , this.englishStart , this.englishEnd - this.englishStart + 1 , Lexeme.TYPE_ENGLISH);
context.addLexeme(newLexeme);
this.englishStart = -1;
this.englishEnd= -1;
}
if(context.isBufferConsumed() && (this.englishStart != -1 && this.englishEnd != -1)){
//缓冲以读完输出词元
Lexeme newLexeme = new Lexeme(context.getBufferOffset() , this.englishStart , this.englishEnd - this.englishStart + 1 , Lexeme.TYPE_ENGLISH);
context.addLexeme(newLexeme);
this.englishStart = -1;
this.englishEnd= -1;
}
//判断是否锁定缓冲区
@ -254,14 +250,12 @@ class LetterSegmenter implements ISegmenter {
}
//判断缓冲区是否已经读完
if(context.isBufferConsumed()){
if(this.arabicStart != -1 && this.arabicEnd != -1){
//生成已切分的词元
Lexeme newLexeme = new Lexeme(context.getBufferOffset() , this.arabicStart , this.arabicEnd - this.arabicStart + 1 , Lexeme.TYPE_ARABIC);
context.addLexeme(newLexeme);
this.arabicStart = -1;
this.arabicEnd = -1;
}
if(context.isBufferConsumed() && (this.arabicStart != -1 && this.arabicEnd != -1)){
//生成已切分的词元
Lexeme newLexeme = new Lexeme(context.getBufferOffset() , this.arabicStart , this.arabicEnd - this.arabicStart + 1 , Lexeme.TYPE_ARABIC);
context.addLexeme(newLexeme);
this.arabicStart = -1;
this.arabicEnd = -1;
}
//判断是否锁定缓冲区

View File

@ -57,7 +57,7 @@ class DictSegment implements Comparable<DictSegment>{
DictSegment(Character nodeChar){
if(nodeChar == null){
throw new IllegalArgumentException("参数为空异常,字符不能为空");
throw new IllegalArgumentException("node char cannot be empty");
}
this.nodeChar = nodeChar;
}
@ -115,7 +115,7 @@ class DictSegment implements Comparable<DictSegment>{
//设置hit的当前处理位置
searchHit.setEnd(begin);
Character keyChar = new Character(charArray[begin]);
Character keyChar = Character.valueOf(charArray[begin]);
DictSegment ds = null;
//引用实例变量为本地变量避免查询时遇到更新的同步问题
@ -187,7 +187,7 @@ class DictSegment implements Comparable<DictSegment>{
*/
private synchronized void fillSegment(char[] charArray , int begin , int length , int enabled){
//获取字典表中的汉字对象
Character beginChar = new Character(charArray[begin]);
Character beginChar = Character.valueOf(charArray[begin]);
Character keyChar = charMap.get(beginChar);
//字典中没有该字则将其添加入字典
if(keyChar == null){
@ -280,11 +280,9 @@ class DictSegment implements Comparable<DictSegment>{
* 线程同步方法
*/
private DictSegment[] getChildrenArray(){
if(this.childrenArray == null){
synchronized(this){
if(this.childrenArray == null){
synchronized(this){
if(this.childrenArray == null){
this.childrenArray = new DictSegment[ARRAY_LENGTH_LIMIT];
}
}
}
return this.childrenArray;
@ -295,11 +293,9 @@ class DictSegment implements Comparable<DictSegment>{
* 线程同步方法
*/
private Map<Character , DictSegment> getChildrenMap(){
if(this.childrenMap == null){
synchronized(this){
if(this.childrenMap == null){
this.childrenMap = new ConcurrentHashMap<Character, DictSegment>(ARRAY_LENGTH_LIMIT * 2,0.8f);
}
synchronized(this){
if(this.childrenMap == null){
this.childrenMap = new ConcurrentHashMap<Character, DictSegment>(ARRAY_LENGTH_LIMIT * 2,0.8f);
}
}
return this.childrenMap;

836
src/main/java/org/wltea/analyzer/dic/Dictionary.java Normal file → Executable file

File diff suppressed because it is too large Load Diff

View File

@ -1,15 +1,22 @@
package org.wltea.analyzer.dic;
import java.io.IOException;
import java.security.AccessController;
import java.security.PrivilegedAction;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpHead;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.logging.log4j.Logger;
import org.elasticsearch.SpecialPermission;
import org.wltea.analyzer.help.ESPluginLoggerFactory;
public class Monitor implements Runnable {
private static final Logger logger = ESPluginLoggerFactory.getLogger(Monitor.class.getName());
private static CloseableHttpClient httpclient = HttpClients.createDefault();
/*
* 上次更改时间
@ -19,17 +26,26 @@ public class Monitor implements Runnable {
* 资源属性
*/
private String eTags;
/*
* 请求地址
*/
private String location;
private String location;
public Monitor(String location) {
this.location = location;
this.last_modified = null;
this.eTags = null;
}
public void run() {
SpecialPermission.check();
AccessController.doPrivileged((PrivilegedAction<Void>) () -> {
this.runUnprivileged();
return null;
});
}
/**
* 监控流程
* 向词库服务器发送Head请求
@ -38,16 +54,16 @@ public class Monitor implements Runnable {
* 如果有变化重新加载词典
* 休眠1min返回第
*/
public void run() {
public void runUnprivileged() {
//超时设置
RequestConfig rc = RequestConfig.custom().setConnectionRequestTimeout(10*1000)
.setConnectTimeout(10*1000).setSocketTimeout(15*1000).build();
HttpHead head = new HttpHead(location);
head.setConfig(rc);
//设置请求头
if (last_modified != null) {
head.setHeader("If-Modified-Since", last_modified);
@ -55,38 +71,41 @@ public class Monitor implements Runnable {
if (eTags != null) {
head.setHeader("If-None-Match", eTags);
}
CloseableHttpResponse response = null;
try {
response = httpclient.execute(head);
//返回200 才做操作
if(response.getStatusLine().getStatusCode()==200){
if (!response.getLastHeader("Last-Modified").getValue().equalsIgnoreCase(last_modified)
||!response.getLastHeader("ETags").getValue().equalsIgnoreCase(eTags)) {
if (((response.getLastHeader("Last-Modified")!=null) && !response.getLastHeader("Last-Modified").getValue().equalsIgnoreCase(last_modified))
||((response.getLastHeader("ETag")!=null) && !response.getLastHeader("ETag").getValue().equalsIgnoreCase(eTags))) {
// 远程词库有更新,需要重新加载词典并修改last_modified,eTags
Dictionary.getSingleton().reLoadMainDict();
last_modified = response.getLastHeader("Last-Modified")==null?null:response.getLastHeader("Last-Modified").getValue();
eTags = response.getLastHeader("ETags")==null?null:response.getLastHeader("ETags").getValue();
eTags = response.getLastHeader("ETag")==null?null:response.getLastHeader("ETag").getValue();
}
}else if (response.getStatusLine().getStatusCode()==304) {
//没有修改不做操作
//noop
}else{
Dictionary.logger.info("remote_ext_dict {} return bad code {}" , location , response.getStatusLine().getStatusCode() );
logger.info("remote_ext_dict {} return bad code {}" , location , response.getStatusLine().getStatusCode() );
}
} catch (Exception e) {
Dictionary.logger.error("remote_ext_dict {} error!",e , location);
logger.error("remote_ext_dict {} error!",e , location);
}finally{
try {
if (response != null) {
response.close();
}
} catch (IOException e) {
e.printStackTrace();
logger.error(e.getMessage(), e);
}
}
}
}

View File

@ -0,0 +1,27 @@
package org.wltea.analyzer.help;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.spi.ExtendedLogger;
public class ESPluginLoggerFactory {
private ESPluginLoggerFactory() {
}
static public Logger getLogger(String name) {
return getLogger("", LogManager.getLogger(name));
}
static public Logger getLogger(String prefix, String name) {
return getLogger(prefix, LogManager.getLogger(name));
}
static public Logger getLogger(String prefix, Class<?> clazz) {
return getLogger(prefix, LogManager.getLogger(clazz.getName()));
}
static public Logger getLogger(String prefix, Logger logger) {
return (Logger)(prefix != null && prefix.length() != 0 ? new PrefixPluginLogger((ExtendedLogger)logger, logger.getName(), prefix) : logger);
}
}

View File

@ -0,0 +1,48 @@
package org.wltea.analyzer.help;
import org.apache.logging.log4j.Level;
import org.apache.logging.log4j.Marker;
import org.apache.logging.log4j.MarkerManager;
import org.apache.logging.log4j.message.Message;
import org.apache.logging.log4j.message.MessageFactory;
import org.apache.logging.log4j.spi.ExtendedLogger;
import org.apache.logging.log4j.spi.ExtendedLoggerWrapper;
import java.util.WeakHashMap;
public class PrefixPluginLogger extends ExtendedLoggerWrapper {
private static final WeakHashMap<String, Marker> markers = new WeakHashMap();
private final Marker marker;
static int markersSize() {
return markers.size();
}
public String prefix() {
return this.marker.getName();
}
PrefixPluginLogger(ExtendedLogger logger, String name, String prefix) {
super(logger, name, (MessageFactory) null);
String actualPrefix = prefix == null ? "" : prefix;
WeakHashMap var6 = markers;
MarkerManager.Log4jMarker actualMarker;
synchronized (markers) {
MarkerManager.Log4jMarker maybeMarker = (MarkerManager.Log4jMarker) markers.get(actualPrefix);
if (maybeMarker == null) {
actualMarker = new MarkerManager.Log4jMarker(actualPrefix);
markers.put(new String(actualPrefix), actualMarker);
} else {
actualMarker = maybeMarker;
}
}
this.marker = (Marker) actualMarker;
}
public void logMessage(String fqcn, Level level, Marker marker, Message message, Throwable t) {
assert marker == null;
super.logMessage(fqcn, level, this.marker, message, t);
}
}

View File

@ -1,30 +1,38 @@
package org.wltea.analyzer.help;
import org.apache.logging.log4j.Logger;
public class Sleep {
public enum Type{MSEC,SEC,MIN,HOUR};
public static void sleep(Type type,int num){
try {
switch(type){
case MSEC:
Thread.sleep(num);
return;
case SEC:
Thread.sleep(num*1000);
return;
case MIN:
Thread.sleep(num*60*1000);
return;
case HOUR:
Thread.sleep(num*60*60*1000);
return;
default:
System.err.println("输入类型错误应为MSEC,SEC,MIN,HOUR之一");
return;
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private static final Logger logger = ESPluginLoggerFactory.getLogger(Sleep.class.getName());
public enum Type {MSEC, SEC, MIN, HOUR}
;
public static void sleep(Type type, int num) {
try {
switch (type) {
case MSEC:
Thread.sleep(num);
return;
case SEC:
Thread.sleep(num * 1000);
return;
case MIN:
Thread.sleep(num * 60 * 1000);
return;
case HOUR:
Thread.sleep(num * 60 * 60 * 1000);
return;
default:
System.err.println("输入类型错误应为MSEC,SEC,MIN,HOUR之一");
return;
}
} catch (InterruptedException e) {
logger.error(e.getMessage(), e);
}
}
}

View File

@ -24,13 +24,9 @@
*/
package org.wltea.analyzer.lucene;
import java.io.Reader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Tokenizer;
import org.elasticsearch.common.settings.ImmutableSettings;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.wltea.analyzer.cfg.Configuration;
/**
* IK分词器Lucene Analyzer接口实现
@ -38,15 +34,7 @@ import org.elasticsearch.env.Environment;
*/
public final class IKAnalyzer extends Analyzer{
private boolean useSmart;
public boolean useSmart() {
return useSmart;
}
public void setUseSmart(boolean useSmart) {
this.useSmart = useSmart;
}
private Configuration configuration;
/**
* IK分词器Lucene Analyzer接口实现类
@ -54,35 +42,26 @@ public final class IKAnalyzer extends Analyzer{
* 默认细粒度切分算法
*/
public IKAnalyzer(){
this(false);
}
/**
/**
* IK分词器Lucene Analyzer接口实现类
*
* @param useSmart 当为true时分词器进行智能切分
* @param configuration IK配置
*/
public IKAnalyzer(boolean useSmart){
public IKAnalyzer(Configuration configuration){
super();
this.useSmart = useSmart;
this.configuration = configuration;
}
Settings settings=ImmutableSettings.EMPTY;
Environment environment=new Environment();
public IKAnalyzer(Settings indexSetting,Settings settings, Environment environment) {
super();
this.settings=settings;
this.environment= environment;
}
/**
* 重载Analyzer接口构造分词组件
*/
@Override
protected TokenStreamComponents createComponents(String fieldName, final Reader in) {
Tokenizer _IKTokenizer = new IKTokenizer(in , settings, environment);
protected TokenStreamComponents createComponents(String fieldName) {
Tokenizer _IKTokenizer = new IKTokenizer(configuration);
return new TokenStreamComponents(_IKTokenizer);
}
}
}

View File

@ -32,6 +32,7 @@ import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.wltea.analyzer.cfg.Configuration;
import org.wltea.analyzer.core.IKSegmenter;
import org.wltea.analyzer.core.Lexeme;
@ -64,16 +65,15 @@ public final class IKTokenizer extends Tokenizer {
/**
* Lucene 4.0 Tokenizer适配器类构造函数
* @param in
*/
public IKTokenizer(Reader in , Settings settings, Environment environment){
super(in);
public IKTokenizer(Configuration configuration){
super();
offsetAtt = addAttribute(OffsetAttribute.class);
termAtt = addAttribute(CharTermAttribute.class);
typeAtt = addAttribute(TypeAttribute.class);
posIncrAtt = addAttribute(PositionIncrementAttribute.class);
_IKImplement = new IKSegmenter(input , settings, environment);
_IKImplement = new IKSegmenter(input,configuration);
}
/* (non-Javadoc)
@ -95,7 +95,6 @@ public final class IKTokenizer extends Tokenizer {
//设置词元长度
termAtt.setLength(nextLexeme.getLength());
//设置词元位移
// offsetAtt.setOffset(nextLexeme.getBeginPosition(), nextLexeme.getEndPosition());
offsetAtt.setOffset(correctOffset(nextLexeme.getBeginPosition()), correctOffset(nextLexeme.getEndPosition()));
//记录分词的最后位置

View File

@ -1,712 +0,0 @@
/**
* IK 中文分词 版本 5.0
* IK Analyzer release 5.0
*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* 源代码由林良益(linliangyi2005@gmail.com)提供
* 版权声明 2012乌龙茶工作室
* provided by Linliangyi and copyright 2012 by Oolong studio
*
*/
package org.wltea.analyzer.query;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.*;
import org.apache.lucene.search.BooleanClause.Occur;
import org.apache.lucene.util.BytesRef;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.Stack;
/**
* IK简易查询表达式解析
* 结合SWMCQuery算法
*
* 表达式例子
* (id='1231231' && title:'monkey') || (content:'你好吗' || ulr='www.ik.com') - name:'helloword'
* @author linliangyi
*
*/
public class IKQueryExpressionParser {
//public static final String LUCENE_SPECIAL_CHAR = "&&||-()':={}[],";
private List<Element> elements = new ArrayList<Element>();
private Stack<Query> querys = new Stack<Query>();
private Stack<Element> operates = new Stack<Element>();
/**
* 解析查询表达式生成Lucene Query对象
*
* @param expression
* @param quickMode
* @return Lucene query
*/
public Query parseExp(String expression , boolean quickMode){
Query lucenceQuery = null;
if(expression != null && !"".equals(expression)){
try{
//文法解析
this.splitElements(expression);
//语法解析
this.parseSyntax(quickMode);
if(this.querys.size() == 1){
lucenceQuery = this.querys.pop();
}else{
throw new IllegalStateException("表达式异常: 缺少逻辑操作符 或 括号缺失");
}
}finally{
elements.clear();
querys.clear();
operates.clear();
}
}
return lucenceQuery;
}
/**
* 表达式文法解析
* @param expression
*/
private void splitElements(String expression){
if(expression == null){
return;
}
Element curretElement = null;
char[] expChars = expression.toCharArray();
for(int i = 0 ; i < expChars.length ; i++){
switch(expChars[i]){
case '&' :
if(curretElement == null){
curretElement = new Element();
curretElement.type = '&';
curretElement.append(expChars[i]);
}else if(curretElement.type == '&'){
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
}else if(curretElement.type == '\''){
curretElement.append(expChars[i]);
}else {
this.elements.add(curretElement);
curretElement = new Element();
curretElement.type = '&';
curretElement.append(expChars[i]);
}
break;
case '|' :
if(curretElement == null){
curretElement = new Element();
curretElement.type = '|';
curretElement.append(expChars[i]);
}else if(curretElement.type == '|'){
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
}else if(curretElement.type == '\''){
curretElement.append(expChars[i]);
}else {
this.elements.add(curretElement);
curretElement = new Element();
curretElement.type = '|';
curretElement.append(expChars[i]);
}
break;
case '-' :
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
continue;
}else{
this.elements.add(curretElement);
}
}
curretElement = new Element();
curretElement.type = '-';
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
break;
case '(' :
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
continue;
}else{
this.elements.add(curretElement);
}
}
curretElement = new Element();
curretElement.type = '(';
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
break;
case ')' :
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
continue;
}else{
this.elements.add(curretElement);
}
}
curretElement = new Element();
curretElement.type = ')';
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
break;
case ':' :
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
continue;
}else{
this.elements.add(curretElement);
}
}
curretElement = new Element();
curretElement.type = ':';
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
break;
case '=' :
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
continue;
}else{
this.elements.add(curretElement);
}
}
curretElement = new Element();
curretElement.type = '=';
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
break;
case ' ' :
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
}else{
this.elements.add(curretElement);
curretElement = null;
}
}
break;
case '\'' :
if(curretElement == null){
curretElement = new Element();
curretElement.type = '\'';
}else if(curretElement.type == '\''){
this.elements.add(curretElement);
curretElement = null;
}else{
this.elements.add(curretElement);
curretElement = new Element();
curretElement.type = '\'';
}
break;
case '[':
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
continue;
}else{
this.elements.add(curretElement);
}
}
curretElement = new Element();
curretElement.type = '[';
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
break;
case ']':
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
continue;
}else{
this.elements.add(curretElement);
}
}
curretElement = new Element();
curretElement.type = ']';
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
break;
case '{':
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
continue;
}else{
this.elements.add(curretElement);
}
}
curretElement = new Element();
curretElement.type = '{';
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
break;
case '}':
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
continue;
}else{
this.elements.add(curretElement);
}
}
curretElement = new Element();
curretElement.type = '}';
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
break;
case ',':
if(curretElement != null){
if(curretElement.type == '\''){
curretElement.append(expChars[i]);
continue;
}else{
this.elements.add(curretElement);
}
}
curretElement = new Element();
curretElement.type = ',';
curretElement.append(expChars[i]);
this.elements.add(curretElement);
curretElement = null;
break;
default :
if(curretElement == null){
curretElement = new Element();
curretElement.type = 'F';
curretElement.append(expChars[i]);
}else if(curretElement.type == 'F'){
curretElement.append(expChars[i]);
}else if(curretElement.type == '\''){
curretElement.append(expChars[i]);
}else{
this.elements.add(curretElement);
curretElement = new Element();
curretElement.type = 'F';
curretElement.append(expChars[i]);
}
}
}
if(curretElement != null){
this.elements.add(curretElement);
curretElement = null;
}
}
/**
* 语法解析
*
*/
private void parseSyntax(boolean quickMode){
for(int i = 0 ; i < this.elements.size() ; i++){
Element e = this.elements.get(i);
if('F' == e.type){
Element e2 = this.elements.get(i + 1);
if('=' != e2.type && ':' != e2.type){
throw new IllegalStateException("表达式异常: = 或 号丢失");
}
Element e3 = this.elements.get(i + 2);
//处理 = 运算
if('\'' == e3.type){
i+=2;
if('=' == e2.type){
TermQuery tQuery = new TermQuery(new Term(e.toString() , e3.toString()));
this.querys.push(tQuery);
}else if(':' == e2.type){
String keyword = e3.toString();
//SWMCQuery Here
Query _SWMCQuery = SWMCQueryBuilder.create(e.toString(), keyword , quickMode);
this.querys.push(_SWMCQuery);
}
}else if('[' == e3.type || '{' == e3.type){
i+=2;
//处理 [] {}
LinkedList<Element> eQueue = new LinkedList<Element>();
eQueue.add(e3);
for( i++ ; i < this.elements.size() ; i++){
Element eN = this.elements.get(i);
eQueue.add(eN);
if(']' == eN.type || '}' == eN.type){
break;
}
}
//翻译RangeQuery
Query rangeQuery = this.toTermRangeQuery(e , eQueue);
this.querys.push(rangeQuery);
}else{
throw new IllegalStateException("表达式异常:匹配值丢失");
}
}else if('(' == e.type){
this.operates.push(e);
}else if(')' == e.type){
boolean doPop = true;
while(doPop && !this.operates.empty()){
Element op = this.operates.pop();
if('(' == op.type){
doPop = false;
}else {
Query q = toBooleanQuery(op);
this.querys.push(q);
}
}
}else{
if(this.operates.isEmpty()){
this.operates.push(e);
}else{
boolean doPeek = true;
while(doPeek && !this.operates.isEmpty()){
Element eleOnTop = this.operates.peek();
if('(' == eleOnTop.type){
doPeek = false;
this.operates.push(e);
}else if(compare(e , eleOnTop) == 1){
this.operates.push(e);
doPeek = false;
}else if(compare(e , eleOnTop) == 0){
Query q = toBooleanQuery(eleOnTop);
this.operates.pop();
this.querys.push(q);
}else{
Query q = toBooleanQuery(eleOnTop);
this.operates.pop();
this.querys.push(q);
}
}
if(doPeek && this.operates.empty()){
this.operates.push(e);
}
}
}
}
while(!this.operates.isEmpty()){
Element eleOnTop = this.operates.pop();
Query q = toBooleanQuery(eleOnTop);
this.querys.push(q);
}
}
/**
* 根据逻辑操作符生成BooleanQuery
* @param op
* @return
*/
private Query toBooleanQuery(Element op){
if(this.querys.size() == 0){
return null;
}
BooleanQuery resultQuery = new BooleanQuery();
if(this.querys.size() == 1){
return this.querys.get(0);
}
Query q2 = this.querys.pop();
Query q1 = this.querys.pop();
if('&' == op.type){
if(q1 != null){
if(q1 instanceof BooleanQuery){
BooleanClause[] clauses = ((BooleanQuery)q1).getClauses();
if(clauses.length > 0
&& clauses[0].getOccur() == Occur.MUST){
for(BooleanClause c : clauses){
resultQuery.add(c);
}
}else{
resultQuery.add(q1,Occur.MUST);
}
}else{
//q1 instanceof TermQuery
//q1 instanceof TermRangeQuery
//q1 instanceof PhraseQuery
//others
resultQuery.add(q1,Occur.MUST);
}
}
if(q2 != null){
if(q2 instanceof BooleanQuery){
BooleanClause[] clauses = ((BooleanQuery)q2).getClauses();
if(clauses.length > 0
&& clauses[0].getOccur() == Occur.MUST){
for(BooleanClause c : clauses){
resultQuery.add(c);
}
}else{
resultQuery.add(q2,Occur.MUST);
}
}else{
//q1 instanceof TermQuery
//q1 instanceof TermRangeQuery
//q1 instanceof PhraseQuery
//others
resultQuery.add(q2,Occur.MUST);
}
}
}else if('|' == op.type){
if(q1 != null){
if(q1 instanceof BooleanQuery){
BooleanClause[] clauses = ((BooleanQuery)q1).getClauses();
if(clauses.length > 0
&& clauses[0].getOccur() == Occur.SHOULD){
for(BooleanClause c : clauses){
resultQuery.add(c);
}
}else{
resultQuery.add(q1,Occur.SHOULD);
}
}else{
//q1 instanceof TermQuery
//q1 instanceof TermRangeQuery
//q1 instanceof PhraseQuery
//others
resultQuery.add(q1,Occur.SHOULD);
}
}
if(q2 != null){
if(q2 instanceof BooleanQuery){
BooleanClause[] clauses = ((BooleanQuery)q2).getClauses();
if(clauses.length > 0
&& clauses[0].getOccur() == Occur.SHOULD){
for(BooleanClause c : clauses){
resultQuery.add(c);
}
}else{
resultQuery.add(q2,Occur.SHOULD);
}
}else{
//q2 instanceof TermQuery
//q2 instanceof TermRangeQuery
//q2 instanceof PhraseQuery
//others
resultQuery.add(q2,Occur.SHOULD);
}
}
}else if('-' == op.type){
if(q1 == null || q2 == null){
throw new IllegalStateException("表达式异常SubQuery 个数不匹配");
}
if(q1 instanceof BooleanQuery){
BooleanClause[] clauses = ((BooleanQuery)q1).getClauses();
if(clauses.length > 0){
for(BooleanClause c : clauses){
resultQuery.add(c);
}
}else{
resultQuery.add(q1,Occur.MUST);
}
}else{
//q1 instanceof TermQuery
//q1 instanceof TermRangeQuery
//q1 instanceof PhraseQuery
//others
resultQuery.add(q1,Occur.MUST);
}
resultQuery.add(q2,Occur.MUST_NOT);
}
return resultQuery;
}
/**
* 组装TermRangeQuery
* @param elements
* @return
*/
private TermRangeQuery toTermRangeQuery(Element fieldNameEle , LinkedList<Element> elements){
boolean includeFirst = false;
boolean includeLast = false;
String firstValue = null;
String lastValue = null;
//检查第一个元素是否是[或者{
Element first = elements.getFirst();
if('[' == first.type){
includeFirst = true;
}else if('{' == first.type){
includeFirst = false;
}else {
throw new IllegalStateException("表达式异常");
}
//检查最后一个元素是否是]或者}
Element last = elements.getLast();
if(']' == last.type){
includeLast = true;
}else if('}' == last.type){
includeLast = false;
}else {
throw new IllegalStateException("表达式异常, RangeQuery缺少结束括号");
}
if(elements.size() < 4 || elements.size() > 5){
throw new IllegalStateException("表达式异常, RangeQuery 错误");
}
//读出中间部分
Element e2 = elements.get(1);
if('\'' == e2.type){
firstValue = e2.toString();
//
Element e3 = elements.get(2);
if(',' != e3.type){
throw new IllegalStateException("表达式异常, RangeQuery缺少逗号分隔");
}
//
Element e4 = elements.get(3);
if('\'' == e4.type){
lastValue = e4.toString();
}else if(e4 != last){
throw new IllegalStateException("表达式异常RangeQuery格式错误");
}
}else if(',' == e2.type){
firstValue = null;
//
Element e3 = elements.get(2);
if('\'' == e3.type){
lastValue = e3.toString();
}else{
throw new IllegalStateException("表达式异常RangeQuery格式错误");
}
}else {
throw new IllegalStateException("表达式异常, RangeQuery格式错误");
}
return new TermRangeQuery(fieldNameEle.toString() , new BytesRef(firstValue) , new BytesRef(lastValue) , includeFirst , includeLast);
}
/**
* 比较操作符优先级
* @param e1
* @param e2
* @return
*/
private int compare(Element e1 , Element e2){
if('&' == e1.type){
if('&' == e2.type){
return 0;
}else {
return 1;
}
}else if('|' == e1.type){
if('&' == e2.type){
return -1;
}else if('|' == e2.type){
return 0;
}else{
return 1;
}
}else{
if('-' == e2.type){
return 0;
}else{
return -1;
}
}
}
/**
* 表达式元素操作符FieldNameFieldValue
* @author linliangyi
* May 20, 2010
*/
private class Element{
char type = 0;
StringBuffer eleTextBuff;
public Element(){
eleTextBuff = new StringBuffer();
}
public void append(char c){
this.eleTextBuff.append(c);
}
public String toString(){
return this.eleTextBuff.toString();
}
}
public static void main(String[] args){
IKQueryExpressionParser parser = new IKQueryExpressionParser();
//String ikQueryExp = "newsTitle:'的两款《魔兽世界》插件Bigfoot和月光宝盒'";
String ikQueryExp = "(id='ABcdRf' && date:{'20010101','20110101'} && keyword:'魔兽中国') || (content:'KSHT-KSH-A001-18' || ulr='www.ik.com') - name:'林良益'";
Query result = parser.parseExp(ikQueryExp , true);
System.out.println(result);
}
}

View File

@ -1,154 +0,0 @@
/**
* IK 中文分词 版本 5.0
* IK Analyzer release 5.0
*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* 源代码由林良益(linliangyi2005@gmail.com)提供
* 版权声明 2012乌龙茶工作室
* provided by Linliangyi and copyright 2012 by Oolong studio
*
*/
package org.wltea.analyzer.query;
import java.io.IOException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.Query;
import org.apache.lucene.util.Version;
import org.wltea.analyzer.core.IKSegmenter;
import org.wltea.analyzer.core.Lexeme;
/**
* Single Word Multi Char Query Builder
* IK分词算法专用
* @author linliangyi
*
*/
public class SWMCQueryBuilder {
/**
* 生成SWMCQuery
* @param fieldName
* @param keywords
* @param quickMode
* @return Lucene Query
*/
public static Query create(String fieldName ,String keywords , boolean quickMode){
if(fieldName == null || keywords == null){
throw new IllegalArgumentException("参数 fieldName 、 keywords 不能为null.");
}
//1.对keywords进行分词处理
List<Lexeme> lexemes = doAnalyze(keywords);
//2.根据分词结果生成SWMCQuery
Query _SWMCQuery = getSWMCQuery(fieldName , lexemes , quickMode);
return _SWMCQuery;
}
/**
* 分词切分并返回结链表
* @param keywords
* @return
*/
private static List<Lexeme> doAnalyze(String keywords){
List<Lexeme> lexemes = new ArrayList<Lexeme>();
IKSegmenter ikSeg = new IKSegmenter(new StringReader(keywords));
try{
Lexeme l = null;
while( (l = ikSeg.next()) != null){
lexemes.add(l);
}
}catch(IOException e){
e.printStackTrace();
}
return lexemes;
}
/**
* 根据分词结果生成SWMC搜索
* @param fieldName
// * @param pathOption
* @param quickMode
* @return
*/
private static Query getSWMCQuery(String fieldName , List<Lexeme> lexemes , boolean quickMode){
//构造SWMC的查询表达式
StringBuffer keywordBuffer = new StringBuffer();
//精简的SWMC的查询表达式
StringBuffer keywordBuffer_Short = new StringBuffer();
//记录最后词元长度
int lastLexemeLength = 0;
//记录最后词元结束位置
int lastLexemeEnd = -1;
int shortCount = 0;
int totalCount = 0;
for(Lexeme l : lexemes){
totalCount += l.getLength();
//精简表达式
if(l.getLength() > 1){
keywordBuffer_Short.append(' ').append(l.getLexemeText());
shortCount += l.getLength();
}
if(lastLexemeLength == 0){
keywordBuffer.append(l.getLexemeText());
}else if(lastLexemeLength == 1 && l.getLength() == 1
&& lastLexemeEnd == l.getBeginPosition()){//单字位置相邻长度为一合并)
keywordBuffer.append(l.getLexemeText());
}else{
keywordBuffer.append(' ').append(l.getLexemeText());
}
lastLexemeLength = l.getLength();
lastLexemeEnd = l.getEndPosition();
}
//借助lucene queryparser 生成SWMC Query
QueryParser qp = new QueryParser(Version.LUCENE_40, fieldName, new StandardAnalyzer(Version.LUCENE_40));
qp.setDefaultOperator(QueryParser.AND_OPERATOR);
qp.setAutoGeneratePhraseQueries(true);
if(quickMode && (shortCount * 1.0f / totalCount) > 0.5f){
try {
//System.out.println(keywordBuffer.toString());
Query q = qp.parse(keywordBuffer_Short.toString());
return q;
} catch (ParseException e) {
e.printStackTrace();
}
}else{
if(keywordBuffer.length() > 0){
try {
//System.out.println(keywordBuffer.toString());
Query q = qp.parse(keywordBuffer.toString());
return q;
} catch (ParseException e) {
e.printStackTrace();
}
}
}
return null;
}
}

View File

@ -1,86 +0,0 @@
/**
* IK 中文分词 版本 5.0.1
* IK Analyzer release 5.0.1
*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* 源代码由林良益(linliangyi2005@gmail.com)提供
* 版权声明 2012乌龙茶工作室
* provided by Linliangyi and copyright 2012 by Oolong studio
*
*
*/
package org.wltea.analyzer.sample;
import java.io.IOException;
import java.io.StringReader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.TypeAttribute;
import org.wltea.analyzer.lucene.IKAnalyzer;
/**
* 使用IKAnalyzer进行分词的演示
* 2012-10-22
*
*/
public class IKAnalzyerDemo {
public static void main(String[] args){
//构建IK分词器使用smart分词模式
Analyzer analyzer = new IKAnalyzer(true);
//获取Lucene的TokenStream对象
TokenStream ts = null;
try {
ts = analyzer.tokenStream("myfield", new StringReader("WORLD ,.. html DATA</html>HELLO"));
// ts = analyzer.tokenStream("myfield", new StringReader("这是一个中文分词的例子你可以直接运行它IKAnalyer can analysis english text too"));
//获取词元位置属性
OffsetAttribute offset = ts.addAttribute(OffsetAttribute.class);
//获取词元文本属性
CharTermAttribute term = ts.addAttribute(CharTermAttribute.class);
//获取词元文本属性
TypeAttribute type = ts.addAttribute(TypeAttribute.class);
//重置TokenStream重置StringReader
ts.reset();
//迭代获取分词结果
while (ts.incrementToken()) {
System.out.println(offset.startOffset() + " - " + offset.endOffset() + " : " + term.toString() + " | " + type.type());
}
//关闭TokenStream关闭StringReader
ts.end(); // Perform end-of-stream operations, e.g. set the final offset.
} catch (IOException e) {
e.printStackTrace();
} finally {
//释放TokenStream的所有资源
if(ts != null){
try {
ts.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}

View File

@ -1,147 +0,0 @@
/**
* IK 中文分词 版本 5.0
* IK Analyzer release 5.0
*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* 源代码由林良益(linliangyi2005@gmail.com)提供
* 版权声明 2012乌龙茶工作室
* provided by Linliangyi and copyright 2012 by Oolong studio
*
*
*/
package org.wltea.analyzer.sample;
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
import org.wltea.analyzer.lucene.IKAnalyzer;
/**
* 使用IKAnalyzer进行Lucene索引和查询的演示
* 2012-3-2
*
* 以下是结合Lucene4.0 API的写法
*
*/
public class LuceneIndexAndSearchDemo {
/**
* 模拟
* 创建一个单条记录的索引并对其进行搜索
* @param args
*/
public static void main(String[] args){
//Lucene Document的域名
String fieldName = "text";
//检索内容
String text = "IK Analyzer是一个结合词典分词和文法分词的中文分词开源工具包。它使用了全新的正向迭代最细粒度切分算法。";
//实例化IKAnalyzer分词器
Analyzer analyzer = new IKAnalyzer(true);
Directory directory = null;
IndexWriter iwriter = null;
IndexReader ireader = null;
IndexSearcher isearcher = null;
try {
//建立内存索引对象
directory = new RAMDirectory();
//配置IndexWriterConfig
IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_40 , analyzer);
iwConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);
iwriter = new IndexWriter(directory , iwConfig);
//写入索引
Document doc = new Document();
doc.add(new StringField("ID", "10000", Field.Store.YES));
doc.add(new TextField(fieldName, text, Field.Store.YES));
iwriter.addDocument(doc);
iwriter.close();
//搜索过程**********************************
//实例化搜索器
ireader = DirectoryReader.open(directory);
isearcher = new IndexSearcher(ireader);
String keyword = "中文分词工具包";
//使用QueryParser查询分析器构造Query对象
QueryParser qp = new QueryParser(Version.LUCENE_40, fieldName, analyzer);
qp.setDefaultOperator(QueryParser.AND_OPERATOR);
Query query = qp.parse(keyword);
System.out.println("Query = " + query);
//搜索相似度最高的5条记录
TopDocs topDocs = isearcher.search(query , 5);
System.out.println("命中:" + topDocs.totalHits);
//输出结果
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
for (int i = 0; i < topDocs.totalHits; i++){
Document targetDoc = isearcher.doc(scoreDocs[i].doc);
System.out.println("内容:" + targetDoc.toString());
}
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
} finally{
if(ireader != null){
try {
ireader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if(directory != null){
try {
directory.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}

View File

@ -1,2 +0,0 @@
plugin=org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin
version=${project.version}

View File

@ -0,0 +1,56 @@
# Elasticsearch plugin descriptor file
# This file must exist as 'plugin-descriptor.properties' at
# the root directory of all plugins.
#
# A plugin can be 'site', 'jvm', or both.
#
### example site plugin for "foo":
#
# foo.zip <-- zip file for the plugin, with this structure:
# _site/ <-- the contents that will be served
# plugin-descriptor.properties <-- example contents below:
#
# site=true
# description=My cool plugin
# version=1.0
#
### example jvm plugin for "foo"
#
# foo.zip <-- zip file for the plugin, with this structure:
# <arbitrary name1>.jar <-- classes, resources, dependencies
# <arbitrary nameN>.jar <-- any number of jars
# plugin-descriptor.properties <-- example contents below:
#
# jvm=true
# classname=foo.bar.BazPlugin
# description=My cool plugin
# version=2.0.0-rc1
# elasticsearch.version=2.0
# java.version=1.7
#
### mandatory elements for all plugins:
#
# 'description': simple summary of the plugin
description=${project.description}
#
# 'version': plugin's version
version=${project.version}
#
# 'name': the plugin name
name=${elasticsearch.plugin.name}
#
# 'classname': the name of the class to load, fully-qualified.
classname=${elasticsearch.plugin.classname}
#
# 'java.version' version of java the code is built against
# use the system property java.specification.version
# version string must be a sequence of nonnegative decimal integers
# separated by "."'s and may have leading zeros
java.version=${maven.compiler.target}
#
# 'elasticsearch.version' version of elasticsearch compiled against
# You will have to release a new version of the plugin for each new
# elasticsearch release. This version is checked when the plugin
# is loaded so Elasticsearch will refuse to start in the presence of
# plugins with the incorrect elasticsearch.version.
elasticsearch.version=${elasticsearch.version}

View File

@ -0,0 +1,4 @@
grant {
// needed because of the hot reload functionality
permission java.net.SocketPermission "*", "connect,resolve";
};

View File

@ -1,83 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<Diagram>
<ID>JAVA</ID>
<OriginalElement>org.elasticsearch.index.analysis.IKAnalysisBinderProcessor</OriginalElement>
<nodes>
<node x="1244.0" y="553.0">org.elasticsearch.index.analysis.IKAnalysisBinderProcessor</node>
<node x="2212.0" y="489.0">org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.AnalyzersBindings</node>
<node x="1316.0" y="0.0">java.lang.Object</node>
<node x="1244.0" y="329.0">org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor</node>
<node x="616.0" y="510.0">org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.TokenFiltersBindings</node>
<node x="0.0" y="510.0">org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.CharFiltersBindings</node>
<node x="1608.0" y="510.0">org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.TokenizersBindings</node>
</nodes>
<notes />
<edges>
<edge source="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.TokenFiltersBindings" target="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor">
<point x="152.0" y="-77.0" />
<point x="1072.0" y="469.0" />
<point x="1347.2" y="469.0" />
<point x="-68.79999999999995" y="55.0" />
</edge>
<edge source="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.CharFiltersBindings" target="java.lang.Object">
<point x="-149.0" y="-77.0" />
<point x="149.0" y="299.0" />
<point x="1336.0" y="299.0" />
<point x="-80.0" y="139.5" />
</edge>
<edge source="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor" target="java.lang.Object">
<point x="0.0" y="-55.0" />
<point x="0.0" y="139.5" />
</edge>
<edge source="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.AnalyzersBindings" target="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor">
<point x="-180.5" y="-98.0" />
<point x="2392.5" y="459.0" />
<point x="1553.6" y="459.0" />
<point x="137.5999999999999" y="55.0" />
</edge>
<edge source="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.CharFiltersBindings" target="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor">
<point x="149.0" y="-77.0" />
<point x="447.0" y="459.0" />
<point x="1278.4" y="459.0" />
<point x="-137.5999999999999" y="55.0" />
</edge>
<edge source="org.elasticsearch.index.analysis.IKAnalysisBinderProcessor" target="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor">
<point x="0.0" y="-34.0" />
<point x="0.0" y="55.0" />
</edge>
<edge source="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.TokenFiltersBindings" target="java.lang.Object">
<point x="-152.0" y="-77.0" />
<point x="768.0" y="309.0" />
<point x="1376.0" y="309.0" />
<point x="-40.0" y="139.5" />
</edge>
<edge source="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.AnalyzersBindings" target="java.lang.Object">
<point x="180.5" y="-98.0" />
<point x="2753.5" y="299.0" />
<point x="1496.0" y="299.0" />
<point x="80.0" y="139.5" />
</edge>
<edge source="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.TokenizersBindings" target="java.lang.Object">
<point x="146.0" y="-77.0" />
<point x="2046.0" y="309.0" />
<point x="1456.0" y="309.0" />
<point x="40.0" y="139.5" />
</edge>
<edge source="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor.TokenizersBindings" target="org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor">
<point x="-146.0" y="-77.0" />
<point x="1754.0" y="469.0" />
<point x="1484.8" y="469.0" />
<point x="68.79999999999995" y="55.0" />
</edge>
</edges>
<settings layout="Hierarchic Group" zoom="1.0" x="110.5" y="89.0" />
<SelectedNodes />
<Categories>
<Category>Fields</Category>
<Category>Methods</Category>
<Category>Constructors</Category>
<Category>Inner Classes</Category>
<Category>Properties</Category>
</Categories>
</Diagram>