Compare commits

...

85 Commits
2.x ... master

Author SHA1 Message Date
medcl
9338c19104 update to 8.4.1 2022-09-02 18:44:03 +08:00
Medcl
0fb53ac32c
Update pom.xml
Update log4j
2022-01-19 11:59:06 +08:00
medcl
b637708ba0 update log4j 2021-12-13 09:45:53 +08:00
medcl
9c47725ea0 update for 7.14 2021-08-04 17:19:10 +08:00
Medcl
8e36b3240e
Update FUNDING.yml 2021-05-19 17:27:37 +08:00
Medcl
e0157d5f39
Update FUNDING.yml 2021-05-19 17:27:04 +08:00
Medcl
0fccc038e2
Create FUNDING.yml 2021-05-19 16:50:12 +08:00
Jack
5a1b8c8da6
Read chunked remote words (#817)
Fix chunked content could not be read as it will not get content length
I see there is an issue #780 and this fix it
2020-09-06 16:34:40 +08:00
medcl
1375ca6d39 fix #789 2020-06-10 16:05:01 +08:00
Howard
4619effa15 transfer log message from chinese to english (#746) 2019-12-19 15:31:04 +08:00
medcl
5f53f1a5bf Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2019-10-07 19:01:51 +08:00
medcl
904a7493ea update to 7.4.0 2019-10-07 19:01:29 +08:00
zhipingpan
06e8a23d18 Update AnalyzeContext.java (#673) 2019-05-01 16:57:44 +08:00
Hongliang Wang
a1d6ba8ca2 Correct Search Analyzer (#668)
The former search analyzer `ik-max-word` will give the wrong result against described later in the README file.
2019-04-19 20:23:43 +08:00
medcl
90c9b58354 update example 2019-04-11 10:07:22 +08:00
medcl
ba8bb85f31 update to support 7.x 2019-04-11 09:35:19 +08:00
medcl
125ac3c5e5 Merge branch 'pr/621' 2019-03-25 11:02:22 +08:00
medcl
f0dd522e60 Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2019-03-25 10:42:23 +08:00
medcl
9eaa2b90eb fix NPE 2019-03-06 19:02:56 +08:00
pengcong90
9873489ba7
Update AnalyzeContext.java
使用ik_smart切分 金力泰合同审批 切分的结果是(金  力  泰  合同  审批)但是使用ik_max_word切分结果是(金  力  泰合  合同  审批 批),这样就存在搜索(金力泰  金力泰合同审批) 搜索不到的情况,查看源码发现泰未在字典中,泰合  合同在字典中,导致smart切分消歧的时候按照逆向概率高的规则忽略了泰合,输出结果泰就单独切分了,可以在输出结果时判断下 字典中无单字,但是词元冲突了,切分出相交词元的前一个词元中的单字,这样就能解决这个问题
2018-11-21 11:00:29 +08:00
杨晓东
949531572b 修改适配elasticsearch 6.5.0 (#615)
Signed-off-by: 杨晓东 <03131302@163.com>
2018-11-20 13:36:39 +00:00
byronhe
1d750a9bdd Update AnalyzeContext.java (#617) 2018-11-20 13:15:37 +00:00
黄松
3a7a81c29d Update README.md (#581) 2018-08-06 16:54:06 +08:00
medcl
1422d5b96c Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2018-07-26 10:25:44 +08:00
medcl
9fff6379ef remove deploy in travis 2018-07-26 10:25:16 +08:00
Rueian
5190dba198 Grant java.net.SocketPermission (#565) 2018-06-28 16:11:46 +08:00
wksw
83fa2ff8b2 直接解压到plugins目录下导致es无法启动 (#564)
直接解压到plugins会报Could not load plugin descriptor for plugin directory [plugin-descriptor.properties]错误
2018-06-26 10:14:02 +08:00
medcl
0222529290 Remove intermediate elasticsearch directory within plugin zips 2018-06-19 11:25:29 +08:00
medcl
5e8d0df2be update es to 6.3.0 2018-06-19 09:14:57 +08:00
medcl
36e6d2d00b update travis 2018-05-06 17:06:19 +08:00
medcl
de1da42d38 update travis 2018-05-06 16:55:07 +08:00
zj0713001
3dcedde9e4 update es to 6.2.4 (#545) 2018-05-04 16:31:35 +08:00
medcl
21a859a48d Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2018-04-09 15:59:12 +08:00
medcl
816b8ddd4b fix ambiguous 2018-04-09 15:58:43 +08:00
Figroc Chen
7028b9ea05 BOM handling of dict file (#517)
Signed-off-by: Peng Chen <figroc@gmail.com>
2018-04-02 13:29:22 +08:00
medcl
4ab2616a96 update es to 6.2.3 2018-04-02 12:24:48 +08:00
medcl
7c9b4771b3 Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2018-03-05 15:29:35 -08:00
medcl
22de5be444 update es to 6.2.2 2018-03-05 15:29:25 -08:00
Figroc Chen
0e8ddbd749 ext dic&stopwords can be dir (#437)
allow dir for ext_dict and ext_stopwords in IKAnalyzer.cfg.xml
By: Peng Chen<figroc@gmail.com>
2018-02-24 16:40:21 +08:00
medcl
7a1445fdda update plugin-descriptor.properties, Close #514 2018-02-10 14:07:20 +08:00
medcl
353cefd5b8 fix example, Close #512 2018-02-09 12:27:46 +08:00
medcl
0922152fb8 update es to 6.2.1 2018-02-09 12:12:22 +08:00
muliuyun
cc01c881af 更新ES到6.1.3 (#510)
* update es to 6.1.3
2018-02-09 11:44:56 +08:00
medcl
eb21c796d8 update es to 6.1.2 2018-01-19 17:32:41 +08:00
medcl
5828cb1c72 update es to 6.1.1 2017-12-20 09:59:49 +08:00
medcl
dc739d2cee update es to 6.0.1 2017-12-20 09:57:14 +08:00
medcl
2851cc2501 update es to 6.0.0 2017-11-15 20:04:49 +08:00
medcl
b32366489b update es to 5.6.4 2017-11-15 19:57:30 +08:00
medcl
6a55e3af76 update es to 5.6.3 2017-10-19 09:56:51 +02:00
medcl
7636e1a234 update es to 5.6.2 2017-10-19 09:51:39 +02:00
medcl
1f2dfbffd5 update es to 5.6.1 2017-09-19 15:35:31 +08:00
medcl
2541e35991 update es to 5.6.0 2017-09-15 10:53:29 +08:00
medcl
55a4f05666 update es to 5.5.3 2017-09-15 10:47:47 +08:00
medcl
6309787f94 update es to 5.5.2 2017-08-30 20:34:07 +08:00
medcl
c4c498a3aa update example 2017-08-03 17:10:30 +08:00
medcl
8da12f3492 update es to 5.5.1 2017-08-03 17:00:02 +08:00
medcl
50230bfa64 fix install by plugin command 2017-08-03 16:59:35 +08:00
杨晓东
adf282f115 修改pom.xml插件版本调整为5.5.0 (#401)
* 提交

Signed-off-by: 杨晓东 <03131302@163.com>

* Update README.md
2017-07-12 20:26:48 +08:00
medcl
1a62eb1651 update es to 5.4.3 2017-07-01 17:48:25 +08:00
medcl
455b672e5a update es to 5.4.2 2017-06-22 10:19:00 +08:00
medcl
2d16b56728 update es to 5.4.1 2017-05-16 10:03:55 +08:00
Zhang Yixin
1987d6ace4 update es to 5.4.0 (#369) 2017-05-16 09:48:03 +08:00
medcl
60e5e7768f update es to 5.3.2 2017-04-28 15:26:29 +08:00
medcl
7dfeb25c8f update readme 2017-04-07 21:11:22 +08:00
medcl
e7d968ffa8 update es to 5.3.0 2017-04-01 14:52:04 +08:00
medcl
a1fea66be8 update es to 5.2.2 2017-03-02 22:53:58 +08:00
medcl
dbb45eec56 update es to 5.2.1 2017-02-15 12:44:59 +08:00
medcl
c5a1553850 update oss version 2017-02-05 18:17:13 +08:00
medcl
400206511d update to es 5.2.0 2017-02-05 18:15:43 +08:00
medcl
d1d216a195 update es to 5.1.2 2017-01-19 10:18:34 +08:00
medcl
494576998a update es to 5.1.1 2016-12-13 17:33:10 +08:00
medcl
b85b487569 update es to v5.0.2 2016-11-30 09:33:29 +08:00
medcl
ffb88ee0fa update es to v5.0.1 2016-11-16 11:45:47 +08:00
medcl
e08d9d9be5 update es to 5.0.0 2016-10-27 16:10:56 +08:00
medcl
754572b2b9 update es to 5.0.0-rc1 2016-10-13 16:26:43 +08:00
medcl
e0ada4440e update README 2016-09-28 12:17:11 +02:00
medcl
17f6e982a5 Merge branch 'master' of github.com:medcl/elasticsearch-analysis-ik 2016-09-28 12:16:09 +02:00
medcl
b6ec9c0a00 update to support es5.0.0-beta1, Closes #282 2016-09-28 12:14:24 +02:00
Hsu Chen-Wei
7c92a10fc0 Add steps for installation (#268)
Tell user to switch tags before compiling.
2016-09-06 04:43:23 +03:00
medcl
f28ec3c3c2 update travis config 2016-08-23 11:39:19 +08:00
medcl
bfcebccd0f update readme 2016-08-23 00:33:52 +08:00
medcl
82c6369501 unify compiler plugin version 2016-08-18 16:28:22 +08:00
medcl
e637c4b1b2 update readme,pom.xml 2016-08-18 15:51:34 +08:00
medcl
ac2b78acd0 bump up compiler to use 1.8 2016-08-18 15:26:46 +08:00
medcl
168a798da8 support es 5.0.0-alpha5 2016-08-18 11:25:45 +08:00
30 changed files with 1085 additions and 651 deletions

2
.github/FUNDING.yml vendored Normal file
View File

@ -0,0 +1,2 @@
patreon: medcl
custom: ["https://www.buymeacoffee.com/medcl"]

View File

@ -1,11 +1,9 @@
sudo: required
jdk:
- oraclejdk8
install: true
script:
- sudo apt-get update && sudo apt-get install oracle-java8-installer
- java -version
language: java
script: mvn clean package
deploy:
provider: releases
api_key:
secure: llxJZlRYBIWINl5XI42RpEe+jTxlmSP6MX+oTNZa4oFjEeN9Kdd1G8+S3HSIhCc31RoF/2zeNsM9OehRi1O6bweNSQ9vjlKZQPD8FYcHaHpYW0U7h/OMbEeC794fAghm9ZsmOTNymdvbAXL14nJTrwOW9W8VqoZT9Jx7Ejad63Y=
file: target/releases/elasticsearch-analysis-ik-*.zip
file_glob: true
on:
repo: medcl/elasticsearch-analysis-ik
tags: true

View File

@ -10,11 +10,11 @@ Versions
IK version | ES version
-----------|-----------
master | 2.3.5 -> master
master | 7.x -> master
6.x| 6.x
5.x| 5.x
1.10.6 | 2.4.6
1.9.5 | 2.3.5
1.9.4 | 2.3.4
1.9.3 | 2.3.3
1.9.0 | 2.3.0
1.8.1 | 2.2.1
1.7.0 | 2.1.1
1.5.0 | 2.0.0
@ -26,22 +26,25 @@ master | 2.3.5 -> master
Install
-------
1.compile
1.download or compile
`mvn package`
* optional 1 - download pre-build package from here: https://github.com/medcl/elasticsearch-analysis-ik/releases
copy and unzip `target/releases/elasticsearch-analysis-ik-{version}.zip` to `your-es-root/plugins/ik`
create plugin folder `cd your-es-root/plugins/ && mkdir ik`
unzip plugin to folder `your-es-root/plugins/ik`
* optional 2 - use elasticsearch-plugin to install ( supported from version v5.5.1 ):
```
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip
```
NOTE: replace `6.3.0` to your own elasticsearch version
2.restart elasticsearch
Tips:
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
#### Quick Example
@ -54,52 +57,41 @@ curl -XPUT http://localhost:9200/index
2.create a mapping
```bash
curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
curl -XPOST http://localhost:9200/index/_mapping -H 'Content-Type:application/json' -d'
{
"fulltext": {
"_all": {
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"term_vector": "no",
"store": "false"
},
"properties": {
"content": {
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word",
"include_in_all": "true",
"boost": 8
}
"search_analyzer": "ik_smart"
}
}
}'
```
3.index some docs
```bash
curl -XPOST http://localhost:9200/index/fulltext/1 -d'
curl -XPOST http://localhost:9200/index/_create/1 -H 'Content-Type:application/json' -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
```
```bash
curl -XPOST http://localhost:9200/index/fulltext/2 -d'
curl -XPOST http://localhost:9200/index/_create/2 -H 'Content-Type:application/json' -d'
{"content":"公安部:各地校车将享最高路权"}
'
```
```bash
curl -XPOST http://localhost:9200/index/fulltext/3 -d'
curl -XPOST http://localhost:9200/index/_create/3 -H 'Content-Type:application/json' -d'
{"content":"中韩渔警冲突调查韩警平均每天扣1艘中国渔船"}
'
```
```bash
curl -XPOST http://localhost:9200/index/fulltext/4 -d'
curl -XPOST http://localhost:9200/index/_create/4 -H 'Content-Type:application/json' -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'
```
@ -107,9 +99,9 @@ curl -XPOST http://localhost:9200/index/fulltext/4 -d'
4.query with highlighting
```bash
curl -XPOST http://localhost:9200/index/fulltext/_search -d'
curl -XPOST http://localhost:9200/index/_search -H 'Content-Type:application/json' -d'
{
"query" : { "term" : { "content" : "中国" }},
"query" : { "match" : { "content" : "中国" }},
"highlight" : {
"pre_tags" : ["<tag1>", "<tag2>"],
"post_tags" : ["</tag1>", "</tag2>"],
@ -226,6 +218,7 @@ have fun.
```bash
git clone https://github.com/medcl/elasticsearch-analysis-ik
cd elasticsearch-analysis-ik
git checkout tags/{version}
mvn clean
mvn compile
mvn package
@ -236,7 +229,27 @@ mvn package
3.分词测试失败
请在某个索引下调用analyze接口测试,而不是直接调用analyze接口
如:http://localhost:9200/your_index/_analyze?text=中华人民共和国MN&tokenizer=my_ik
如:
```bash
curl -XGET "http://localhost:9200/your_index/_analyze" -H 'Content-Type: application/json' -d'
{
"text":"中华人民共和国MN","tokenizer": "my_ik"
}'
```
4. ik_max_word 和 ik_smart 什么区别?
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合,适合 Term Query
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”,适合 Phrase 查询。
Changes
------
*自 v5.0.0 起*
- 移除名为 `ik` 的analyzer和tokenizer,请分别使用 `ik_smart``ik_max_word`
Thanks

View File

@ -3,9 +3,9 @@
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->

View File

@ -1,14 +0,0 @@
medcl
elastic
elasticsearch
kogstash
kibana
marvel
shield
watcher
beats
packetbeat
filebeat
topbeat
metrixbeat
kimchy

475
licenses/lucene-LICENSE.txt Normal file
View File

@ -0,0 +1,475 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Some code in core/src/java/org/apache/lucene/util/UnicodeUtil.java was
derived from unicode conversion examples available at
http://www.unicode.org/Public/PROGRAMS/CVTUTF. Here is the copyright
from those sources:
/*
* Copyright 2001-2004 Unicode, Inc.
*
* Disclaimer
*
* This source code is provided as is by Unicode, Inc. No claims are
* made as to fitness for any particular purpose. No warranties of any
* kind are expressed or implied. The recipient agrees to determine
* applicability of information provided. If this file has been
* purchased on magnetic or optical media from Unicode, Inc., the
* sole remedy for any claim will be exchange of defective media
* within 90 days of receipt.
*
* Limitations on Rights to Redistribute This Code
*
* Unicode, Inc. hereby grants the right to freely use the information
* supplied in this file in the creation of products supporting the
* Unicode Standard, and to make copies of this file in any form
* for internal or external distribution as long as this notice
* remains attached.
*/
Some code in core/src/java/org/apache/lucene/util/ArrayUtil.java was
derived from Python 2.4.2 sources available at
http://www.python.org. Full license is here:
http://www.python.org/download/releases/2.4.2/license/
Some code in core/src/java/org/apache/lucene/util/UnicodeUtil.java was
derived from Python 3.1.2 sources available at
http://www.python.org. Full license is here:
http://www.python.org/download/releases/3.1.2/license/
Some code in core/src/java/org/apache/lucene/util/automaton was
derived from Brics automaton sources available at
www.brics.dk/automaton/. Here is the copyright from those sources:
/*
* Copyright (c) 2001-2009 Anders Moeller
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. The name of the author may not be used to endorse or promote products
* derived from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
* IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
* OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
* IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
* NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
* THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
The levenshtein automata tables in core/src/java/org/apache/lucene/util/automaton
were automatically generated with the moman/finenight FSA package.
Here is the copyright for those sources:
# Copyright (c) 2010, Jean-Philippe Barrette-LaPierre, <jpb@rrette.com>
#
# Permission is hereby granted, free of charge, to any person
# obtaining a copy of this software and associated documentation
# files (the "Software"), to deal in the Software without
# restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following
# conditions:
#
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
# OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
# HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
# OTHER DEALINGS IN THE SOFTWARE.
Some code in core/src/java/org/apache/lucene/util/UnicodeUtil.java was
derived from ICU (http://www.icu-project.org)
The full license is available here:
http://source.icu-project.org/repos/icu/icu/trunk/license.html
/*
* Copyright (C) 1999-2010, International Business Machines
* Corporation and others. All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, and/or sell copies of the
* Software, and to permit persons to whom the Software is furnished to do so,
* provided that the above copyright notice(s) and this permission notice appear
* in all copies of the Software and that both the above copyright notice(s) and
* this permission notice appear in supporting documentation.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS.
* IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE
* LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR
* ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
* IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
* OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*
* Except as contained in this notice, the name of a copyright holder shall not
* be used in advertising or otherwise to promote the sale, use or other
* dealings in this Software without prior written authorization of the
* copyright holder.
*/
The following license applies to the Snowball stemmers:
Copyright (c) 2001, Dr Martin Porter
Copyright (c) 2002, Richard Boulton
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holders nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The following license applies to the KStemmer:
Copyright © 2003,
Center for Intelligent Information Retrieval,
University of Massachusetts, Amherst.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. The names "Center for Intelligent Information Retrieval" and
"University of Massachusetts" must not be used to endorse or promote products
derived from this software without prior written permission. To obtain
permission, contact info@ciir.cs.umass.edu.
THIS SOFTWARE IS PROVIDED BY UNIVERSITY OF MASSACHUSETTS AND OTHER CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
The following license applies to the Morfologik project:
Copyright (c) 2006 Dawid Weiss
Copyright (c) 2007-2011 Dawid Weiss, Marcin Miłkowski
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of Morfologik nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
---
The dictionary comes from Morfologik project. Morfologik uses data from
Polish ispell/myspell dictionary hosted at http://www.sjp.pl/slownik/en/ and
is licenced on the terms of (inter alia) LGPL and Creative Commons
ShareAlike. The part-of-speech tags were added in Morfologik project and
are not found in the data from sjp.pl. The tagset is similar to IPI PAN
tagset.
---
The following license applies to the Morfeusz project,
used by org.apache.lucene.analysis.morfologik.
BSD-licensed dictionary of Polish (SGJP)
http://sgjp.pl/morfeusz/
Copyright © 2011 Zygmunt Saloni, Włodzimierz Gruszczyński,
Marcin Woliński, Robert Wołosz
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the
distribution.
THIS SOFTWARE IS PROVIDED BY COPYRIGHT HOLDERS “AS IS” AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDERS OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

191
licenses/lucene-NOTICE.txt Normal file
View File

@ -0,0 +1,191 @@
Apache Lucene
Copyright 2014 The Apache Software Foundation
This product includes software developed at
The Apache Software Foundation (http://www.apache.org/).
Includes software from other Apache Software Foundation projects,
including, but not limited to:
- Apache Ant
- Apache Jakarta Regexp
- Apache Commons
- Apache Xerces
ICU4J, (under analysis/icu) is licensed under an MIT styles license
and Copyright (c) 1995-2008 International Business Machines Corporation and others
Some data files (under analysis/icu/src/data) are derived from Unicode data such
as the Unicode Character Database. See http://unicode.org/copyright.html for more
details.
Brics Automaton (under core/src/java/org/apache/lucene/util/automaton) is
BSD-licensed, created by Anders Møller. See http://www.brics.dk/automaton/
The levenshtein automata tables (under core/src/java/org/apache/lucene/util/automaton) were
automatically generated with the moman/finenight FSA library, created by
Jean-Philippe Barrette-LaPierre. This library is available under an MIT license,
see http://sites.google.com/site/rrettesite/moman and
http://bitbucket.org/jpbarrette/moman/overview/
The class org.apache.lucene.util.WeakIdentityMap was derived from
the Apache CXF project and is Apache License 2.0.
The Google Code Prettify is Apache License 2.0.
See http://code.google.com/p/google-code-prettify/
JUnit (junit-4.10) is licensed under the Common Public License v. 1.0
See http://junit.sourceforge.net/cpl-v10.html
This product includes code (JaspellTernarySearchTrie) from Java Spelling Checkin
g Package (jaspell): http://jaspell.sourceforge.net/
License: The BSD License (http://www.opensource.org/licenses/bsd-license.php)
The snowball stemmers in
analysis/common/src/java/net/sf/snowball
were developed by Martin Porter and Richard Boulton.
The snowball stopword lists in
analysis/common/src/resources/org/apache/lucene/analysis/snowball
were developed by Martin Porter and Richard Boulton.
The full snowball package is available from
http://snowball.tartarus.org/
The KStem stemmer in
analysis/common/src/org/apache/lucene/analysis/en
was developed by Bob Krovetz and Sergio Guzman-Lara (CIIR-UMass Amherst)
under the BSD-license.
The Arabic,Persian,Romanian,Bulgarian, and Hindi analyzers (common) come with a default
stopword list that is BSD-licensed created by Jacques Savoy. These files reside in:
analysis/common/src/resources/org/apache/lucene/analysis/ar/stopwords.txt,
analysis/common/src/resources/org/apache/lucene/analysis/fa/stopwords.txt,
analysis/common/src/resources/org/apache/lucene/analysis/ro/stopwords.txt,
analysis/common/src/resources/org/apache/lucene/analysis/bg/stopwords.txt,
analysis/common/src/resources/org/apache/lucene/analysis/hi/stopwords.txt
See http://members.unine.ch/jacques.savoy/clef/index.html.
The German,Spanish,Finnish,French,Hungarian,Italian,Portuguese,Russian and Swedish light stemmers
(common) are based on BSD-licensed reference implementations created by Jacques Savoy and
Ljiljana Dolamic. These files reside in:
analysis/common/src/java/org/apache/lucene/analysis/de/GermanLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/de/GermanMinimalStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/es/SpanishLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/fi/FinnishLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/hu/HungarianLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/it/ItalianLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/pt/PortugueseLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/ru/RussianLightStemmer.java
analysis/common/src/java/org/apache/lucene/analysis/sv/SwedishLightStemmer.java
The Stempel analyzer (stempel) includes BSD-licensed software developed
by the Egothor project http://egothor.sf.net/, created by Leo Galambos, Martin Kvapil,
and Edmond Nolan.
The Polish analyzer (stempel) comes with a default
stopword list that is BSD-licensed created by the Carrot2 project. The file resides
in stempel/src/resources/org/apache/lucene/analysis/pl/stopwords.txt.
See http://project.carrot2.org/license.html.
The SmartChineseAnalyzer source code (smartcn) was
provided by Xiaoping Gao and copyright 2009 by www.imdict.net.
WordBreakTestUnicode_*.java (under modules/analysis/common/src/test/)
is derived from Unicode data such as the Unicode Character Database.
See http://unicode.org/copyright.html for more details.
The Morfologik analyzer (morfologik) includes BSD-licensed software
developed by Dawid Weiss and Marcin Miłkowski (http://morfologik.blogspot.com/).
Morfologik uses data from Polish ispell/myspell dictionary
(http://www.sjp.pl/slownik/en/) licenced on the terms of (inter alia)
LGPL and Creative Commons ShareAlike.
Morfologic includes data from BSD-licensed dictionary of Polish (SGJP)
(http://sgjp.pl/morfeusz/)
Servlet-api.jar and javax.servlet-*.jar are under the CDDL license, the original
source code for this can be found at http://www.eclipse.org/jetty/downloads.php
===========================================================================
Kuromoji Japanese Morphological Analyzer - Apache Lucene Integration
===========================================================================
This software includes a binary and/or source version of data from
mecab-ipadic-2.7.0-20070801
which can be obtained from
http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz
or
http://jaist.dl.sourceforge.net/project/mecab/mecab-ipadic/2.7.0-20070801/mecab-ipadic-2.7.0-20070801.tar.gz
===========================================================================
mecab-ipadic-2.7.0-20070801 Notice
===========================================================================
Nara Institute of Science and Technology (NAIST),
the copyright holders, disclaims all warranties with regard to this
software, including all implied warranties of merchantability and
fitness, in no event shall NAIST be liable for
any special, indirect or consequential damages or any damages
whatsoever resulting from loss of use, data or profits, whether in an
action of contract, negligence or other tortuous action, arising out
of or in connection with the use or performance of this software.
A large portion of the dictionary entries
originate from ICOT Free Software. The following conditions for ICOT
Free Software applies to the current dictionary as well.
Each User may also freely distribute the Program, whether in its
original form or modified, to any third party or parties, PROVIDED
that the provisions of Section 3 ("NO WARRANTY") will ALWAYS appear
on, or be attached to, the Program, which is distributed substantially
in the same form as set out herein and that such intended
distribution, if actually made, will neither violate or otherwise
contravene any of the laws and regulations of the countries having
jurisdiction over the User or the intended distribution itself.
NO WARRANTY
The program was produced on an experimental basis in the course of the
research and development conducted during the project and is provided
to users as so produced on an experimental basis. Accordingly, the
program is provided without any warranty whatsoever, whether express,
implied, statutory or otherwise. The term "warranty" used herein
includes, but is not limited to, any warranty of the quality,
performance, merchantability and fitness for a particular purpose of
the program and the nonexistence of any infringement or violation of
any right of any third party.
Each user of the program will agree and understand, and be deemed to
have agreed and understood, that there is no warranty whatsoever for
the program and, accordingly, the entire risk arising from or
otherwise connected with the program is assumed by the user.
Therefore, neither ICOT, the copyright holder, or any other
organization that participated in or was otherwise related to the
development of the program and their respective officials, directors,
officers and other employees shall be held liable for any and all
damages, including, without limitation, general, special, incidental
and consequential damages, arising out of or otherwise in connection
with the use or inability to use the program or any product, material
or result produced or otherwise obtained by using the program,
regardless of whether they have been advised of, or otherwise had
knowledge of, the possibility of such damages at any time during the
project or thereafter. Each user will be deemed to have agreed to the
foregoing by his or her commencement of use of the program. The term
"use" as used herein includes, but is not limited to, the use,
modification, copying and distribution of the program and the
production of secondary products from the program.
In the case where the program, whether in its original form or
modified, was distributed or delivered to or received by a user from
any person, organization or entity other than ICOT, unless it makes or
grants independently of ICOT any specific warranty to the user in
writing, such person, organization or entity, will also be exempted
from and not be held liable to the user for any such damages as noted
above as far as the program is concerned.

47
pom.xml Normal file → Executable file
View File

@ -6,14 +6,14 @@
<modelVersion>4.0.0</modelVersion>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-analysis-ik</artifactId>
<version>1.9.5</version>
<version>${elasticsearch.version}</version>
<packaging>jar</packaging>
<description>IK Analyzer for Elasticsearch</description>
<inceptionYear>2011</inceptionYear>
<properties>
<elasticsearch.version>2.3.5</elasticsearch.version>
<maven.compiler.target>1.7</maven.compiler.target>
<elasticsearch.version>8.4.1</elasticsearch.version>
<maven.compiler.target>1.8</maven.compiler.target>
<elasticsearch.assembly.descriptor>${project.basedir}/src/main/assemblies/plugin.xml</elasticsearch.assembly.descriptor>
<elasticsearch.plugin.name>analysis-ik</elasticsearch.plugin.name>
<elasticsearch.plugin.classname>org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin</elasticsearch.plugin.classname>
@ -34,10 +34,10 @@
<developers>
<developer>
<name>Medcl</name>
<email>medcl@elastic.co</email>
<organization>elastic</organization>
<organizationUrl>http://www.elastic.co</organizationUrl>
<name>INFINI Labs</name>
<email>hello@infini.ltd</email>
<organization>INFINI Labs</organization>
<organizationUrl>https://infinilabs.com</organizationUrl>
</developer>
</developers>
@ -51,16 +51,27 @@
<parent>
<groupId>org.sonatype.oss</groupId>
<artifactId>oss-parent</artifactId>
<version>7</version>
<version>9</version>
</parent>
<distributionManagement>
<snapshotRepository>
<id>oss.sonatype.org</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
</snapshotRepository>
<repository>
<id>oss.sonatype.org</id>
<url>https://oss.sonatype.org/service/local/staging/deploy/maven2/</url>
</repository>
</distributionManagement>
<repositories>
<repository>
<id>oss.sonatype.org</id>
<name>OSS Sonatype</name>
<releases><enabled>true</enabled></releases>
<snapshots><enabled>true</enabled></snapshots>
<url>http://oss.sonatype.org/content/repositories/releases/</url>
<url>https://oss.sonatype.org/content/repositories/releases/</url>
</repository>
</repositories>
@ -72,6 +83,7 @@
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
@ -79,10 +91,9 @@
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>runtime</scope>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.18.0</version>
</dependency>
<dependency>
@ -113,8 +124,8 @@
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5.1</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
<source>${maven.compiler.target}</source>
<target>${maven.compiler.target}</target>
</configuration>
</plugin>
<plugin>
@ -205,10 +216,10 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<version>3.5.1</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
<source>${maven.compiler.target}</source>
<target>${maven.compiler.target}</target>
</configuration>
</plugin>
<plugin>

View File

@ -8,20 +8,25 @@
<fileSets>
<fileSet>
<directory>${project.basedir}/config</directory>
<outputDirectory>/config</outputDirectory>
<outputDirectory>config</outputDirectory>
</fileSet>
</fileSets>
<files>
<file>
<source>${project.basedir}/src/main/resources/plugin-descriptor.properties</source>
<outputDirectory></outputDirectory>
<outputDirectory/>
<filtered>true</filtered>
</file>
<file>
<source>${project.basedir}/src/main/resources/plugin-security.policy</source>
<outputDirectory/>
<filtered>true</filtered>
</file>
</files>
<dependencySets>
<dependencySet>
<outputDirectory>/</outputDirectory>
<outputDirectory/>
<useProjectArtifact>true</useProjectArtifact>
<useTransitiveFiltering>true</useTransitiveFiltering>
<excludes>
@ -29,7 +34,7 @@
</excludes>
</dependencySet>
<dependencySet>
<outputDirectory>/</outputDirectory>
<outputDirectory/>
<useProjectArtifact>true</useProjectArtifact>
<useTransitiveFiltering>true</useTransitiveFiltering>
<includes>

View File

@ -1,23 +0,0 @@
package org.elasticsearch.index.analysis;
public class IkAnalysisBinderProcessor extends AnalysisModule.AnalysisBinderProcessor {
@Override
public void processTokenFilters(TokenFiltersBindings tokenFiltersBindings) {
}
@Override
public void processAnalyzers(AnalyzersBindings analyzersBindings) {
analyzersBindings.processAnalyzer("ik", IkAnalyzerProvider.class);
}
@Override
public void processTokenizers(TokenizersBindings tokenizersBindings) {
tokenizersBindings.processTokenizer("ik", IkTokenizerFactory.class);
}
}

View File

@ -1,27 +1,30 @@
package org.elasticsearch.index.analysis;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.Index;
import org.elasticsearch.index.settings.IndexSettingsService;
import org.elasticsearch.index.IndexSettings;
import org.wltea.analyzer.cfg.Configuration;
import org.wltea.analyzer.dic.Dictionary;
import org.wltea.analyzer.lucene.IKAnalyzer;
public class IkAnalyzerProvider extends AbstractIndexAnalyzerProvider<IKAnalyzer> {
private final IKAnalyzer analyzer;
@Inject
public IkAnalyzerProvider(Index index, IndexSettingsService indexSettingsService, Environment env, @Assisted String name, @Assisted Settings settings) {
super(index, indexSettingsService.getSettings(), name, settings);
public IkAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings,boolean useSmart) {
super(name, settings);
Configuration configuration=new Configuration(env,settings);
Configuration configuration=new Configuration(env,settings).setUseSmart(useSmart);
analyzer=new IKAnalyzer(configuration);
}
public static IkAnalyzerProvider getIkSmartAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) {
return new IkAnalyzerProvider(indexSettings,env,name,settings,true);
}
public static IkAnalyzerProvider getIkAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) {
return new IkAnalyzerProvider(indexSettings,env,name,settings,false);
}
@Override public IKAnalyzer get() {
return this.analyzer;
}

View File

@ -1,24 +1,33 @@
package org.elasticsearch.index.analysis;
import org.apache.lucene.analysis.Tokenizer;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.Index;
import org.elasticsearch.index.settings.IndexSettingsService;
import org.elasticsearch.index.IndexSettings;
import org.wltea.analyzer.cfg.Configuration;
import org.wltea.analyzer.lucene.IKTokenizer;
public class IkTokenizerFactory extends AbstractTokenizerFactory {
private Configuration configuration;
@Inject
public IkTokenizerFactory(Index index, IndexSettingsService indexSettingsService,Environment env, @Assisted String name, @Assisted Settings settings) {
super(index, indexSettingsService.getSettings(), name, settings);
public IkTokenizerFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
super(indexSettings, settings,name);
configuration=new Configuration(env,settings);
}
public static IkTokenizerFactory getIkTokenizerFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
return new IkTokenizerFactory(indexSettings,env, name, settings).setSmart(false);
}
public static IkTokenizerFactory getIkSmartTokenizerFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
return new IkTokenizerFactory(indexSettings,env, name, settings).setSmart(true);
}
public IkTokenizerFactory setSmart(boolean smart){
this.configuration.setUseSmart(smart);
return this;
}
@Override
public Tokenizer create() {
return new IKTokenizer(configuration); }

View File

@ -1,84 +0,0 @@
package org.elasticsearch.indices.analysis;
import org.apache.lucene.analysis.Tokenizer;
import org.elasticsearch.common.component.AbstractComponent;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.analysis.AnalyzerScope;
import org.elasticsearch.index.analysis.PreBuiltAnalyzerProviderFactory;
import org.elasticsearch.index.analysis.PreBuiltTokenizerFactoryFactory;
import org.elasticsearch.index.analysis.TokenizerFactory;
import org.wltea.analyzer.cfg.Configuration;
import org.wltea.analyzer.dic.Dictionary;
import org.wltea.analyzer.lucene.IKAnalyzer;
import org.wltea.analyzer.lucene.IKTokenizer;
/**
* Registers indices level analysis components so, if not explicitly configured,
* will be shared among all indices.
*/
public class IKIndicesAnalysis extends AbstractComponent {
private boolean useSmart=false;
@Inject
public IKIndicesAnalysis(final Settings settings,
IndicesAnalysisService indicesAnalysisService,Environment env) {
super(settings);
final Configuration configuration=new Configuration(env,settings).setUseSmart(false);
final Configuration smartConfiguration=new Configuration(env,settings).setUseSmart(true);
indicesAnalysisService.analyzerProviderFactories().put("ik",
new PreBuiltAnalyzerProviderFactory("ik", AnalyzerScope.GLOBAL,
new IKAnalyzer(configuration)));
indicesAnalysisService.analyzerProviderFactories().put("ik_smart",
new PreBuiltAnalyzerProviderFactory("ik_smart", AnalyzerScope.GLOBAL,
new IKAnalyzer(smartConfiguration)));
indicesAnalysisService.analyzerProviderFactories().put("ik_max_word",
new PreBuiltAnalyzerProviderFactory("ik_max_word", AnalyzerScope.GLOBAL,
new IKAnalyzer(configuration)));
indicesAnalysisService.tokenizerFactories().put("ik",
new PreBuiltTokenizerFactoryFactory(new TokenizerFactory() {
@Override
public String name() {
return "ik";
}
@Override
public Tokenizer create() {
return new IKTokenizer(configuration);
}
}));
indicesAnalysisService.tokenizerFactories().put("ik_smart",
new PreBuiltTokenizerFactoryFactory(new TokenizerFactory() {
@Override
public String name() {
return "ik_smart";
}
@Override
public Tokenizer create() {
return new IKTokenizer(smartConfiguration);
}
}));
indicesAnalysisService.tokenizerFactories().put("ik_max_word",
new PreBuiltTokenizerFactoryFactory(new TokenizerFactory() {
@Override
public String name() {
return "ik_max_word";
}
@Override
public Tokenizer create() {
return new IKTokenizer(configuration);
}
}));
}
}

View File

@ -1,32 +0,0 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.elasticsearch.indices.analysis;
import org.elasticsearch.common.inject.AbstractModule;
/**
*/
public class IKIndicesAnalysisModule extends AbstractModule {
@Override
protected void configure() {
bind(IKIndicesAnalysis.class).asEagerSingleton();
}
}

View File

@ -1,46 +1,41 @@
package org.elasticsearch.plugin.analysis.ik;
import org.elasticsearch.common.inject.AbstractModule;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.Module;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.ESLoggerFactory;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.analysis.AnalysisModule;
import org.elasticsearch.index.analysis.IkAnalysisBinderProcessor;
import org.elasticsearch.indices.analysis.IKIndicesAnalysisModule;
import org.apache.lucene.analysis.Analyzer;
import org.elasticsearch.index.analysis.AnalyzerProvider;
import org.elasticsearch.index.analysis.IkAnalyzerProvider;
import org.elasticsearch.index.analysis.IkTokenizerFactory;
import org.elasticsearch.index.analysis.TokenizerFactory;
import org.elasticsearch.indices.analysis.AnalysisModule;
import org.elasticsearch.plugins.AnalysisPlugin;
import org.elasticsearch.plugins.Plugin;
import org.wltea.analyzer.cfg.Configuration;
import org.wltea.analyzer.dic.Dictionary;
import java.nio.file.Path;
import java.util.Collection;
import java.util.Collections;
import java.util.logging.Logger;
import static java.rmi.Naming.bind;
import java.util.HashMap;
import java.util.Map;
public class AnalysisIkPlugin extends Plugin {
public class AnalysisIkPlugin extends Plugin implements AnalysisPlugin {
public static String PLUGIN_NAME = "analysis-ik";
@Override public String name() {
return PLUGIN_NAME;
}
@Override
public Map<String, AnalysisModule.AnalysisProvider<TokenizerFactory>> getTokenizers() {
Map<String, AnalysisModule.AnalysisProvider<TokenizerFactory>> extra = new HashMap<>();
@Override public String description() {
return PLUGIN_NAME;
extra.put("ik_smart", IkTokenizerFactory::getIkSmartTokenizerFactory);
extra.put("ik_max_word", IkTokenizerFactory::getIkTokenizerFactory);
return extra;
}
@Override
public Collection<Module> nodeModules() {
return Collections.<Module>singletonList(new IKIndicesAnalysisModule());
}
public void onModule(AnalysisModule module) {
module.addProcessor(new IkAnalysisBinderProcessor());
public Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends Analyzer>>> getAnalyzers() {
Map<String, AnalysisModule.AnalysisProvider<AnalyzerProvider<? extends Analyzer>>> extra = new HashMap<>();
extra.put("ik_smart", IkAnalyzerProvider::getIkSmartAnalyzerProvider);
extra.put("ik_max_word", IkAnalyzerProvider::getIkAnalyzerProvider);
return extra;
}
}

11
src/main/java/org/wltea/analyzer/cfg/Configuration.java Normal file → Executable file
View File

@ -4,21 +4,14 @@
package org.wltea.analyzer.cfg;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.io.PathUtils;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.Loggers;
import org.elasticsearch.core.PathUtils;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin;
import org.wltea.analyzer.dic.Dictionary;
import java.io.*;
import java.net.URL;
import java.io.File;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.InvalidPropertiesFormatException;
import java.util.List;
import java.util.Properties;
public class Configuration {

View File

@ -48,7 +48,7 @@ class AnalyzeContext {
private static final int BUFF_EXHAUST_CRITICAL = 100;
//字符读取缓冲
//字符读取缓冲
private char[] segmentBuff;
//字符类型数组
private int[] charTypes;
@ -267,6 +267,15 @@ class AnalyzeContext {
Lexeme l = path.pollFirst();
while(l != null){
this.results.add(l);
//字典中无单字但是词元冲突了切分出相交词元的前一个词元中的单字
/*int innerIndex = index + 1;
for (; innerIndex < index + l.getLength(); innerIndex++) {
Lexeme innerL = path.peekFirst();
if (innerL != null && innerIndex == innerL.getBegin()) {
this.outputSingleCJK(innerIndex - 1);
}
}*/
//将index移至lexeme后
index = l.getBegin() + l.getLength();
l = path.pollFirst();

View File

@ -57,7 +57,7 @@ class DictSegment implements Comparable<DictSegment>{
DictSegment(Character nodeChar){
if(nodeChar == null){
throw new IllegalArgumentException("参数为空异常,字符不能为空");
throw new IllegalArgumentException("node char cannot be empty");
}
this.nodeChar = nodeChar;
}

422
src/main/java/org/wltea/analyzer/dic/Dictionary.java Normal file → Executable file
View File

@ -26,29 +26,38 @@
package org.wltea.analyzer.dic;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.file.attribute.BasicFileAttributes;
import java.nio.file.Files;
import java.nio.file.FileVisitResult;
import java.nio.file.Path;
import java.nio.file.SimpleFileVisitor;
import java.security.AccessController;
import java.security.PrivilegedAction;
import java.util.*;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.elasticsearch.common.io.PathUtils;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.Loggers;
import org.elasticsearch.SpecialPermission;
import org.elasticsearch.core.PathUtils;
import org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin;
import org.wltea.analyzer.cfg.Configuration;
import org.apache.logging.log4j.Logger;
import org.wltea.analyzer.help.ESPluginLoggerFactory;
/**
* 词典管理类,单子模式
@ -62,30 +71,25 @@ public class Dictionary {
private DictSegment _MainDict;
private DictSegment _SurnameDict;
private DictSegment _QuantifierDict;
private DictSegment _SuffixDict;
private DictSegment _PrepDict;
private DictSegment _StopWords;
/**
* 配置对象
*/
private Configuration configuration;
public static ESLogger logger = Loggers.getLogger("ik-analyzer");
private static final Logger logger = ESPluginLoggerFactory.getLogger(Dictionary.class.getName());
private static ScheduledExecutorService pool = Executors.newScheduledThreadPool(1);
public static final String PATH_DIC_MAIN = "main.dic";
public static final String PATH_DIC_SURNAME = "surname.dic";
public static final String PATH_DIC_QUANTIFIER = "quantifier.dic";
public static final String PATH_DIC_SUFFIX = "suffix.dic";
public static final String PATH_DIC_PREP = "preposition.dic";
public static final String PATH_DIC_STOP = "stopword.dic";
private static final String PATH_DIC_MAIN = "main.dic";
private static final String PATH_DIC_SURNAME = "surname.dic";
private static final String PATH_DIC_QUANTIFIER = "quantifier.dic";
private static final String PATH_DIC_SUFFIX = "suffix.dic";
private static final String PATH_DIC_PREP = "preposition.dic";
private static final String PATH_DIC_STOP = "stopword.dic";
private final static String FILE_NAME = "IKAnalyzer.cfg.xml";
private final static String EXT_DICT = "ext_dict";
@ -120,15 +124,13 @@ public class Dictionary {
if (input != null) {
try {
props.loadFromXML(input);
} catch (InvalidPropertiesFormatException e) {
logger.error("ik-analyzer", e);
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
}
public String getProperty(String key){
private String getProperty(String key){
if(props!=null){
return props.getProperty(key);
}
@ -140,7 +142,7 @@ public class Dictionary {
*
* @return Dictionary
*/
public static synchronized Dictionary initial(Configuration cfg) {
public static synchronized void initial(Configuration cfg) {
if (singleton == null) {
synchronized (Dictionary.class) {
if (singleton == null) {
@ -164,14 +166,57 @@ public class Dictionary {
}
}
return singleton;
}
}
}
return singleton;
}
public List<String> getExtDictionarys() {
private void walkFileTree(List<String> files, Path path) {
if (Files.isRegularFile(path)) {
files.add(path.toString());
} else if (Files.isDirectory(path)) try {
Files.walkFileTree(path, new SimpleFileVisitor<Path>() {
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
files.add(file.toString());
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path file, IOException e) {
logger.error("[Ext Loading] listing files", e);
return FileVisitResult.CONTINUE;
}
});
} catch (IOException e) {
logger.error("[Ext Loading] listing files", e);
} else {
logger.warn("[Ext Loading] file not found: " + path);
}
}
private void loadDictFile(DictSegment dict, Path file, boolean critical, String name) {
try (InputStream is = new FileInputStream(file.toFile())) {
BufferedReader br = new BufferedReader(
new InputStreamReader(is, "UTF-8"), 512);
String word = br.readLine();
if (word != null) {
if (word.startsWith("\uFEFF"))
word = word.substring(1);
for (; word != null; word = br.readLine()) {
word = word.trim();
if (word.isEmpty()) continue;
dict.fillSegment(word.toCharArray());
}
}
} catch (FileNotFoundException e) {
logger.error("ik-analyzer: " + name + " not found", e);
if (critical) throw new RuntimeException("ik-analyzer: " + name + " not found!!!", e);
} catch (IOException e) {
logger.error("ik-analyzer: " + name + " loading failed", e);
}
}
private List<String> getExtDictionarys() {
List<String> extDictFiles = new ArrayList<String>(2);
String extDictCfg = getProperty(EXT_DICT);
if (extDictCfg != null) {
@ -179,8 +224,8 @@ public class Dictionary {
String[] filePaths = extDictCfg.split(";");
for (String filePath : filePaths) {
if (filePath != null && !"".equals(filePath.trim())) {
Path file = PathUtils.get(filePath.trim());
extDictFiles.add(file.toString());
Path file = PathUtils.get(getDictRoot(), filePath.trim());
walkFileTree(extDictFiles, file);
}
}
@ -188,7 +233,7 @@ public class Dictionary {
return extDictFiles;
}
public List<String> getRemoteExtDictionarys() {
private List<String> getRemoteExtDictionarys() {
List<String> remoteExtDictFiles = new ArrayList<String>(2);
String remoteExtDictCfg = getProperty(REMOTE_EXT_DICT);
if (remoteExtDictCfg != null) {
@ -204,7 +249,7 @@ public class Dictionary {
return remoteExtDictFiles;
}
public List<String> getExtStopWordDictionarys() {
private List<String> getExtStopWordDictionarys() {
List<String> extStopWordDictFiles = new ArrayList<String>(2);
String extStopWordDictCfg = getProperty(EXT_STOP);
if (extStopWordDictCfg != null) {
@ -212,8 +257,8 @@ public class Dictionary {
String[] filePaths = extStopWordDictCfg.split(";");
for (String filePath : filePaths) {
if (filePath != null && !"".equals(filePath.trim())) {
Path file = PathUtils.get(filePath.trim());
extStopWordDictFiles.add(file.toString());
Path file = PathUtils.get(getDictRoot(), filePath.trim());
walkFileTree(extStopWordDictFiles, file);
}
}
@ -221,7 +266,7 @@ public class Dictionary {
return extStopWordDictFiles;
}
public List<String> getRemoteExtStopWordDictionarys() {
private List<String> getRemoteExtStopWordDictionarys() {
List<String> remoteExtStopWordDictFiles = new ArrayList<String>(2);
String remoteExtStopWordDictCfg = getProperty(REMOTE_EXT_STOP);
if (remoteExtStopWordDictCfg != null) {
@ -237,7 +282,7 @@ public class Dictionary {
return remoteExtStopWordDictFiles;
}
public String getDictRoot() {
private String getDictRoot() {
return conf_dir.toAbsolutePath().toString();
}
@ -249,7 +294,7 @@ public class Dictionary {
*/
public static Dictionary getSingleton() {
if (singleton == null) {
throw new IllegalStateException("词典尚未初始化请先调用initial方法");
throw new IllegalStateException("ik dict has not been initialized yet, please call initial method first.");
}
return singleton;
}
@ -341,37 +386,7 @@ public class Dictionary {
// 读取主词典文件
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_MAIN);
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error(e.getMessage(), e);
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord = null;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_MainDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
loadDictFile(_MainDict, file, false, "Main Dict");
// 加载扩展词典
this.loadExtDict();
// 加载远程自定义词库
@ -385,44 +400,11 @@ public class Dictionary {
// 加载扩展词典配置
List<String> extDictFiles = getExtDictionarys();
if (extDictFiles != null) {
InputStream is = null;
for (String extDictName : extDictFiles) {
// 读取扩展词典文件
logger.info("[Dict Loading] " + extDictName);
Path file = PathUtils.get(getDictRoot(), extDictName);
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
// 如果找不到扩展的字典则忽略
if (is == null) {
continue;
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord = null;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
// 加载扩展词典数据到主内存词典中
_MainDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
Path file = PathUtils.get(extDictName);
loadDictFile(_MainDict, file, false, "Extra Dict");
}
}
}
@ -437,7 +419,7 @@ public class Dictionary {
List<String> lists = getRemoteWords(location);
// 如果找不到扩展的字典则忽略
if (lists == null) {
logger.error("[Dict Loading] " + location + "加载失败");
logger.error("[Dict Loading] " + location + " load failed");
continue;
}
for (String theWord : lists) {
@ -451,10 +433,17 @@ public class Dictionary {
}
private static List<String> getRemoteWords(String location) {
SpecialPermission.check();
return AccessController.doPrivileged((PrivilegedAction<List<String>>) () -> {
return getRemoteWordsUnprivileged(location);
});
}
/**
* 从远程服务器上下载自定义词条
*/
private static List<String> getRemoteWords(String location) {
private static List<String> getRemoteWordsUnprivileged(String location) {
List<String> buffer = new ArrayList<String>();
RequestConfig rc = RequestConfig.custom().setConnectionRequestTimeout(10 * 1000).setConnectTimeout(10 * 1000)
@ -470,11 +459,18 @@ public class Dictionary {
String charset = "UTF-8";
// 获取编码默认为utf-8
if (response.getEntity().getContentType().getValue().contains("charset=")) {
String contentType = response.getEntity().getContentType().getValue();
charset = contentType.substring(contentType.lastIndexOf("=") + 1);
HttpEntity entity = response.getEntity();
if(entity!=null){
Header contentType = entity.getContentType();
if(contentType!=null&&contentType.getValue()!=null){
String typeValue = contentType.getValue();
if(typeValue!=null&&typeValue.contains("charset=")){
charset = typeValue.substring(typeValue.lastIndexOf("=") + 1);
}
in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), charset));
}
if (entity.getContentLength() > 0 || entity.isChunked()) {
in = new BufferedReader(new InputStreamReader(entity.getContent(), charset));
String line;
while ((line = in.readLine()) != null) {
buffer.add(line);
@ -483,12 +479,10 @@ public class Dictionary {
response.close();
return buffer;
}
}
}
response.close();
} catch (ClientProtocolException e) {
logger.error("getRemoteWords {} error", e, location);
} catch (IllegalStateException e) {
logger.error("getRemoteWords {} error", e, location);
} catch (IOException e) {
} catch (IllegalStateException | IOException e) {
logger.error("getRemoteWords {} error", e, location);
}
return buffer;
@ -503,80 +497,17 @@ public class Dictionary {
// 读取主词典文件
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_STOP);
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error(e.getMessage(), e);
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord = null;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_StopWords.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
loadDictFile(_StopWords, file, false, "Main Stopwords");
// 加载扩展停止词典
List<String> extStopWordDictFiles = getExtStopWordDictionarys();
if (extStopWordDictFiles != null) {
is = null;
for (String extStopWordDictName : extStopWordDictFiles) {
logger.info("[Dict Loading] " + extStopWordDictName);
// 读取扩展词典文件
file = PathUtils.get(getDictRoot(), extStopWordDictName);
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
// 如果找不到扩展的字典则忽略
if (is == null) {
continue;
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord = null;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
// 加载扩展停止词典数据到内存中
_StopWords.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
file = PathUtils.get(extStopWordDictName);
loadDictFile(_StopWords, file, false, "Extra Stopwords");
}
}
@ -587,7 +518,7 @@ public class Dictionary {
List<String> lists = getRemoteWords(location);
// 如果找不到扩展的字典则忽略
if (lists == null) {
logger.error("[Dict Loading] " + location + "加载失败");
logger.error("[Dict Loading] " + location + " load failed");
continue;
}
for (String theWord : lists) {
@ -609,146 +540,29 @@ public class Dictionary {
_QuantifierDict = new DictSegment((char) 0);
// 读取量词词典文件
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_QUANTIFIER);
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord = null;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_QuantifierDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException ioe) {
logger.error("Quantifier Dictionary loading exception.");
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
loadDictFile(_QuantifierDict, file, false, "Quantifier");
}
private void loadSurnameDict() {
_SurnameDict = new DictSegment((char) 0);
DictSegment _SurnameDict = new DictSegment((char) 0);
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_SURNAME);
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
if (is == null) {
throw new RuntimeException("Surname Dictionary not found!!!");
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_SurnameDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
loadDictFile(_SurnameDict, file, true, "Surname");
}
private void loadSuffixDict() {
_SuffixDict = new DictSegment((char) 0);
DictSegment _SuffixDict = new DictSegment((char) 0);
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_SUFFIX);
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
if (is == null) {
throw new RuntimeException("Suffix Dictionary not found!!!");
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_SuffixDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
is.close();
is = null;
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
loadDictFile(_SuffixDict, file, true, "Suffix");
}
private void loadPrepDict() {
_PrepDict = new DictSegment((char) 0);
DictSegment _PrepDict = new DictSegment((char) 0);
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_PREP);
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
if (is == null) {
throw new RuntimeException("Preposition Dictionary not found!!!");
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_PrepDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
is.close();
is = null;
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
loadDictFile(_PrepDict, file, true, "Preposition");
}
public void reLoadMainDict() {
logger.info("重新加载词典...");
void reLoadMainDict() {
logger.info("start to reload ik dict.");
// 新开一个实例加载词典减少加载过程对当前词典使用的影响
Dictionary tmpDict = new Dictionary(configuration);
tmpDict.configuration = getSingleton().configuration;
@ -756,7 +570,7 @@ public class Dictionary {
tmpDict.loadStopWordDict();
_MainDict = tmpDict._MainDict;
_StopWords = tmpDict._StopWords;
logger.info("重新加载词典完毕...");
logger.info("reload ik dict finished.");
}
}

View File

@ -1,18 +1,21 @@
package org.wltea.analyzer.dic;
import java.io.IOException;
import java.security.AccessController;
import java.security.PrivilegedAction;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpHead;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.Loggers;
import org.apache.logging.log4j.Logger;
import org.elasticsearch.SpecialPermission;
import org.wltea.analyzer.help.ESPluginLoggerFactory;
public class Monitor implements Runnable {
public static ESLogger logger= Loggers.getLogger("ik-analyzer");
private static final Logger logger = ESPluginLoggerFactory.getLogger(Monitor.class.getName());
private static CloseableHttpClient httpclient = HttpClients.createDefault();
/*
@ -34,6 +37,15 @@ public class Monitor implements Runnable {
this.last_modified = null;
this.eTags = null;
}
public void run() {
SpecialPermission.check();
AccessController.doPrivileged((PrivilegedAction<Void>) () -> {
this.runUnprivileged();
return null;
});
}
/**
* 监控流程
* 向词库服务器发送Head请求
@ -43,7 +55,7 @@ public class Monitor implements Runnable {
* 休眠1min返回第
*/
public void run() {
public void runUnprivileged() {
//超时设置
RequestConfig rc = RequestConfig.custom().setConnectionRequestTimeout(10*1000)
@ -80,11 +92,11 @@ public class Monitor implements Runnable {
//没有修改不做操作
//noop
}else{
Dictionary.logger.info("remote_ext_dict {} return bad code {}" , location , response.getStatusLine().getStatusCode() );
logger.info("remote_ext_dict {} return bad code {}" , location , response.getStatusLine().getStatusCode() );
}
} catch (Exception e) {
Dictionary.logger.error("remote_ext_dict {} error!",e , location);
logger.error("remote_ext_dict {} error!",e , location);
}finally{
try {
if (response != null) {

View File

@ -0,0 +1,27 @@
package org.wltea.analyzer.help;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.spi.ExtendedLogger;
public class ESPluginLoggerFactory {
private ESPluginLoggerFactory() {
}
static public Logger getLogger(String name) {
return getLogger("", LogManager.getLogger(name));
}
static public Logger getLogger(String prefix, String name) {
return getLogger(prefix, LogManager.getLogger(name));
}
static public Logger getLogger(String prefix, Class<?> clazz) {
return getLogger(prefix, LogManager.getLogger(clazz.getName()));
}
static public Logger getLogger(String prefix, Logger logger) {
return (Logger)(prefix != null && prefix.length() != 0 ? new PrefixPluginLogger((ExtendedLogger)logger, logger.getName(), prefix) : logger);
}
}

View File

@ -0,0 +1,48 @@
package org.wltea.analyzer.help;
import org.apache.logging.log4j.Level;
import org.apache.logging.log4j.Marker;
import org.apache.logging.log4j.MarkerManager;
import org.apache.logging.log4j.message.Message;
import org.apache.logging.log4j.message.MessageFactory;
import org.apache.logging.log4j.spi.ExtendedLogger;
import org.apache.logging.log4j.spi.ExtendedLoggerWrapper;
import java.util.WeakHashMap;
public class PrefixPluginLogger extends ExtendedLoggerWrapper {
private static final WeakHashMap<String, Marker> markers = new WeakHashMap();
private final Marker marker;
static int markersSize() {
return markers.size();
}
public String prefix() {
return this.marker.getName();
}
PrefixPluginLogger(ExtendedLogger logger, String name, String prefix) {
super(logger, name, (MessageFactory) null);
String actualPrefix = prefix == null ? "" : prefix;
WeakHashMap var6 = markers;
MarkerManager.Log4jMarker actualMarker;
synchronized (markers) {
MarkerManager.Log4jMarker maybeMarker = (MarkerManager.Log4jMarker) markers.get(actualPrefix);
if (maybeMarker == null) {
actualMarker = new MarkerManager.Log4jMarker(actualPrefix);
markers.put(new String(actualPrefix), actualMarker);
} else {
actualMarker = maybeMarker;
}
}
this.marker = (Marker) actualMarker;
}
public void logMessage(String fqcn, Level level, Marker marker, Message message, Throwable t) {
assert marker == null;
super.logMessage(fqcn, level, this.marker, message, t);
}
}

View File

@ -1,27 +1,29 @@
package org.wltea.analyzer.help;
import org.elasticsearch.common.logging.ESLogger;
import org.elasticsearch.common.logging.Loggers;
import org.apache.logging.log4j.Logger;
public class Sleep {
public static ESLogger logger= Loggers.getLogger("ik-analyzer");
private static final Logger logger = ESPluginLoggerFactory.getLogger(Sleep.class.getName());
public enum Type{MSEC,SEC,MIN,HOUR};
public static void sleep(Type type,int num){
public enum Type {MSEC, SEC, MIN, HOUR}
;
public static void sleep(Type type, int num) {
try {
switch(type){
switch (type) {
case MSEC:
Thread.sleep(num);
return;
case SEC:
Thread.sleep(num*1000);
Thread.sleep(num * 1000);
return;
case MIN:
Thread.sleep(num*60*1000);
Thread.sleep(num * 60 * 1000);
return;
case HOUR:
Thread.sleep(num*60*60*1000);
Thread.sleep(num * 60 * 60 * 1000);
return;
default:
System.err.println("输入类型错误应为MSEC,SEC,MIN,HOUR之一");

View File

@ -38,21 +38,6 @@ version=${project.version}
#
# 'name': the plugin name
name=${elasticsearch.plugin.name}
### mandatory elements for site plugins:
#
# 'site': set to true to indicate contents of the _site/
# directory in the root of the plugin should be served.
site=${elasticsearch.plugin.site}
#
### mandatory elements for jvm plugins :
#
# 'jvm': true if the 'classname' class should be loaded
# from jar files in the root directory of the plugin.
# Note that only jar files in the root directory are
# added to the classpath for the plugin! If you need
# other resources, package them into a resources jar.
jvm=${elasticsearch.plugin.jvm}
#
# 'classname': the name of the class to load, fully-qualified.
classname=${elasticsearch.plugin.classname}
@ -69,12 +54,3 @@ java.version=${maven.compiler.target}
# is loaded so Elasticsearch will refuse to start in the presence of
# plugins with the incorrect elasticsearch.version.
elasticsearch.version=${elasticsearch.version}
#
### deprecated elements for jvm plugins :
#
# 'isolated': true if the plugin should have its own classloader.
# passing false is deprecated, and only intended to support plugins
# that have hard dependencies against each other. If this is
# not specified, then the plugin is isolated by default.
isolated=${elasticsearch.plugin.isolated}
#

View File

@ -0,0 +1,4 @@
grant {
// needed because of the hot reload functionality
permission java.net.SocketPermission "*", "connect,resolve";
};