Compare commits

..

6 Commits
master ... 5.x

Author SHA1 Message Date
medcl
828922ed66 update es to v5.6.9 2018-06-06 09:31:38 +08:00
medcl
5b225ff5e9 update es to 5.6.8 2018-03-05 15:25:48 -08:00
medcl
6e495183dc update es to 5.6.7 2018-02-09 12:32:56 +08:00
medcl
2eae2c3839 update es to 5.6.6 2018-02-09 12:31:42 +08:00
lostsquirrel
7ca5eb6f7b 5.x (#484)
* update version in install instruction
2018-01-19 17:17:41 +08:00
Deyong Zhu
f120fcbb21 support es 5.6.5 version (#486) 2018-01-19 17:14:49 +08:00
17 changed files with 414 additions and 309 deletions

2
.github/FUNDING.yml vendored
View File

@ -1,2 +0,0 @@
patreon: medcl
custom: ["https://www.buymeacoffee.com/medcl"]

View File

@ -7,3 +7,12 @@ script:
- java -version
language: java
script: mvn clean package
deploy:
provider: releases
api_key:
secure: llxJZlRYBIWINl5XI42RpEe+jTxlmSP6MX+oTNZa4oFjEeN9Kdd1G8+S3HSIhCc31RoF/2zeNsM9OehRi1O6bweNSQ9vjlKZQPD8FYcHaHpYW0U7h/OMbEeC794fAghm9ZsmOTNymdvbAXL14nJTrwOW9W8VqoZT9Jx7Ejad63Y=
file: target/releases/elasticsearch-analysis-ik-*.zip
file_glob: true
on:
repo: medcl/elasticsearch-analysis-ik
tags: true

View File

@ -3,16 +3,21 @@ IK Analysis for Elasticsearch
The IK Analysis plugin integrates Lucene IK analyzer (http://code.google.com/p/ik-analyzer/) into elasticsearch, support customized dictionary.
Analyzer: `ik_smart` , `ik_max_word` , Tokenizer: `ik_smart` , `ik_max_word`
Analyzer: `ik_smart` , `ik_max_word` , Tokenizer: `ik_smart` , `ik_max_word`
Versions
--------
IK version | ES version
-----------|-----------
master | 7.x -> master
6.x| 6.x
5.x| 5.x
master | 5.x -> master
5.6.9| 5.6.9
5.6.4| 5.6.4
5.5.3| 5.5.3
5.4.3| 5.4.3
5.3.3| 5.3.3
5.2.2| 5.2.2
5.1.2| 5.1.2
1.10.6 | 2.4.6
1.9.5 | 2.3.5
1.8.1 | 2.2.1
@ -29,18 +34,12 @@ Install
1.download or compile
* optional 1 - download pre-build package from here: https://github.com/medcl/elasticsearch-analysis-ik/releases
create plugin folder `cd your-es-root/plugins/ && mkdir ik`
unzip plugin to folder `your-es-root/plugins/ik`
unzip plugin to folder `your-es-root/plugins/`
* optional 2 - use elasticsearch-plugin to install ( supported from version v5.5.1 ):
* optional 2 - use elasticsearch-plugin to install ( version > v5.5.1 ):
```
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip
```
NOTE: replace `6.3.0` to your own elasticsearch version
`./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.5/elasticsearch-analysis-ik-5.6.5.zip`
2.restart elasticsearch
@ -57,41 +56,41 @@ curl -XPUT http://localhost:9200/index
2.create a mapping
```bash
curl -XPOST http://localhost:9200/index/_mapping -H 'Content-Type:application/json' -d'
curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
{
"properties": {
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
"search_analyzer": "ik_max_word"
}
}
}'
```
3.index some docs
```bash
curl -XPOST http://localhost:9200/index/_create/1 -H 'Content-Type:application/json' -d'
curl -XPOST http://localhost:9200/index/fulltext/1 -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
```
```bash
curl -XPOST http://localhost:9200/index/_create/2 -H 'Content-Type:application/json' -d'
curl -XPOST http://localhost:9200/index/fulltext/2 -d'
{"content":"公安部:各地校车将享最高路权"}
'
```
```bash
curl -XPOST http://localhost:9200/index/_create/3 -H 'Content-Type:application/json' -d'
curl -XPOST http://localhost:9200/index/fulltext/3 -d'
{"content":"中韩渔警冲突调查韩警平均每天扣1艘中国渔船"}
'
```
```bash
curl -XPOST http://localhost:9200/index/_create/4 -H 'Content-Type:application/json' -d'
curl -XPOST http://localhost:9200/index/fulltext/4 -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'
```
@ -99,7 +98,7 @@ curl -XPOST http://localhost:9200/index/_create/4 -H 'Content-Type:application/j
4.query with highlighting
```bash
curl -XPOST http://localhost:9200/index/_search -H 'Content-Type:application/json' -d'
curl -XPOST http://localhost:9200/index/fulltext/_search -d'
{
"query" : { "match" : { "content" : "中国" }},
"highlight" : {
@ -229,25 +228,19 @@ mvn package
3.分词测试失败
请在某个索引下调用analyze接口测试,而不是直接调用analyze接口
如:
```bash
curl -XGET "http://localhost:9200/your_index/_analyze" -H 'Content-Type: application/json' -d'
{
"text":"中华人民共和国MN","tokenizer": "my_ik"
}'
```
如:http://localhost:9200/your_index/_analyze?text=中华人民共和国MN&tokenizer=my_ik
4. ik_max_word 和 ik_smart 什么区别?
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合,适合 Term Query
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”,适合 Phrase 查询
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
Changes
------
*自 v5.0.0*
*5.0.0*
- 移除名为 `ik` 的analyzer和tokenizer,请分别使用 `ik_smart``ik_max_word`

16
pom.xml Executable file → Normal file
View File

@ -12,7 +12,7 @@
<inceptionYear>2011</inceptionYear>
<properties>
<elasticsearch.version>8.4.1</elasticsearch.version>
<elasticsearch.version>5.6.9</elasticsearch.version>
<maven.compiler.target>1.8</maven.compiler.target>
<elasticsearch.assembly.descriptor>${project.basedir}/src/main/assemblies/plugin.xml</elasticsearch.assembly.descriptor>
<elasticsearch.plugin.name>analysis-ik</elasticsearch.plugin.name>
@ -21,7 +21,7 @@
<tests.rest.load_packaged>false</tests.rest.load_packaged>
<skip.unit.tests>true</skip.unit.tests>
<gpg.keyname>4E899B30</gpg.keyname>
<gpg.useagent>true</gpg.useagent>
<gpg.useagent>true</gpg.useagent>
</properties>
<licenses>
@ -34,10 +34,10 @@
<developers>
<developer>
<name>INFINI Labs</name>
<email>hello@infini.ltd</email>
<organization>INFINI Labs</organization>
<organizationUrl>https://infinilabs.com</organizationUrl>
<name>Medcl</name>
<email>medcl@elastic.co</email>
<organization>elastic</organization>
<organizationUrl>http://www.elastic.co</organizationUrl>
</developer>
</developers>
@ -71,7 +71,7 @@
<name>OSS Sonatype</name>
<releases><enabled>true</enabled></releases>
<snapshots><enabled>true</enabled></snapshots>
<url>https://oss.sonatype.org/content/repositories/releases/</url>
<url>http://oss.sonatype.org/content/repositories/releases/</url>
</repository>
</repositories>
@ -93,7 +93,7 @@
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.18.0</version>
<version>2.3</version>
</dependency>
<dependency>

View File

@ -8,25 +8,20 @@
<fileSets>
<fileSet>
<directory>${project.basedir}/config</directory>
<outputDirectory>config</outputDirectory>
<outputDirectory>elasticsearch/config</outputDirectory>
</fileSet>
</fileSets>
<files>
<file>
<source>${project.basedir}/src/main/resources/plugin-descriptor.properties</source>
<outputDirectory/>
<filtered>true</filtered>
</file>
<file>
<source>${project.basedir}/src/main/resources/plugin-security.policy</source>
<outputDirectory/>
<outputDirectory>elasticsearch</outputDirectory>
<filtered>true</filtered>
</file>
</files>
<dependencySets>
<dependencySet>
<outputDirectory/>
<outputDirectory>elasticsearch</outputDirectory>
<useProjectArtifact>true</useProjectArtifact>
<useTransitiveFiltering>true</useTransitiveFiltering>
<excludes>
@ -34,7 +29,7 @@
</excludes>
</dependencySet>
<dependencySet>
<outputDirectory/>
<outputDirectory>elasticsearch</outputDirectory>
<useProjectArtifact>true</useProjectArtifact>
<useTransitiveFiltering>true</useTransitiveFiltering>
<includes>

View File

@ -10,7 +10,7 @@ public class IkAnalyzerProvider extends AbstractIndexAnalyzerProvider<IKAnalyzer
private final IKAnalyzer analyzer;
public IkAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings,boolean useSmart) {
super(name, settings);
super(indexSettings, name, settings);
Configuration configuration=new Configuration(env,settings).setUseSmart(useSmart);

View File

@ -11,7 +11,7 @@ public class IkTokenizerFactory extends AbstractTokenizerFactory {
private Configuration configuration;
public IkTokenizerFactory(IndexSettings indexSettings, Environment env, String name, Settings settings) {
super(indexSettings, settings,name);
super(indexSettings, name, settings);
configuration=new Configuration(env,settings);
}

View File

@ -4,7 +4,7 @@
package org.wltea.analyzer.cfg;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.core.PathUtils;
import org.elasticsearch.common.io.PathUtils;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin;

View File

@ -48,7 +48,7 @@ class AnalyzeContext {
private static final int BUFF_EXHAUST_CRITICAL = 100;
//字符读取缓冲
//字符读取缓冲
private char[] segmentBuff;
//字符类型数组
private int[] charTypes;
@ -267,15 +267,6 @@ class AnalyzeContext {
Lexeme l = path.pollFirst();
while(l != null){
this.results.add(l);
//字典中无单字但是词元冲突了切分出相交词元的前一个词元中的单字
/*int innerIndex = index + 1;
for (; innerIndex < index + l.getLength(); innerIndex++) {
Lexeme innerL = path.peekFirst();
if (innerL != null && innerIndex == innerL.getBegin()) {
this.outputSingleCJK(innerIndex - 1);
}
}*/
//将index移至lexeme后
index = l.getBegin() + l.getLength();
l = path.pollFirst();

View File

@ -57,7 +57,7 @@ class DictSegment implements Comparable<DictSegment>{
DictSegment(Character nodeChar){
if(nodeChar == null){
throw new IllegalArgumentException("node char cannot be empty");
throw new IllegalArgumentException("参数为空异常,字符不能为空");
}
this.nodeChar = nodeChar;
}

434
src/main/java/org/wltea/analyzer/dic/Dictionary.java Executable file → Normal file
View File

@ -26,37 +26,29 @@
package org.wltea.analyzer.dic;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.file.attribute.BasicFileAttributes;
import java.nio.file.Files;
import java.nio.file.FileVisitResult;
import java.nio.file.Path;
import java.nio.file.SimpleFileVisitor;
import java.security.AccessController;
import java.security.PrivilegedAction;
import java.util.*;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.elasticsearch.SpecialPermission;
import org.elasticsearch.core.PathUtils;
import org.elasticsearch.common.io.PathUtils;
import org.elasticsearch.common.logging.ESLoggerFactory;
import org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin;
import org.wltea.analyzer.cfg.Configuration;
import org.apache.logging.log4j.Logger;
import org.wltea.analyzer.help.ESPluginLoggerFactory;
/**
@ -71,8 +63,14 @@ public class Dictionary {
private DictSegment _MainDict;
private DictSegment _SurnameDict;
private DictSegment _QuantifierDict;
private DictSegment _SuffixDict;
private DictSegment _PrepDict;
private DictSegment _StopWords;
/**
@ -80,16 +78,16 @@ public class Dictionary {
*/
private Configuration configuration;
private static final Logger logger = ESPluginLoggerFactory.getLogger(Dictionary.class.getName());
private static final Logger logger = ESLoggerFactory.getLogger(Monitor.class.getName());
private static ScheduledExecutorService pool = Executors.newScheduledThreadPool(1);
private static final String PATH_DIC_MAIN = "main.dic";
private static final String PATH_DIC_SURNAME = "surname.dic";
private static final String PATH_DIC_QUANTIFIER = "quantifier.dic";
private static final String PATH_DIC_SUFFIX = "suffix.dic";
private static final String PATH_DIC_PREP = "preposition.dic";
private static final String PATH_DIC_STOP = "stopword.dic";
public static final String PATH_DIC_MAIN = "main.dic";
public static final String PATH_DIC_SURNAME = "surname.dic";
public static final String PATH_DIC_QUANTIFIER = "quantifier.dic";
public static final String PATH_DIC_SUFFIX = "suffix.dic";
public static final String PATH_DIC_PREP = "preposition.dic";
public static final String PATH_DIC_STOP = "stopword.dic";
private final static String FILE_NAME = "IKAnalyzer.cfg.xml";
private final static String EXT_DICT = "ext_dict";
@ -124,13 +122,15 @@ public class Dictionary {
if (input != null) {
try {
props.loadFromXML(input);
} catch (InvalidPropertiesFormatException e) {
logger.error("ik-analyzer", e);
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
}
private String getProperty(String key){
public String getProperty(String key){
if(props!=null){
return props.getProperty(key);
}
@ -142,7 +142,7 @@ public class Dictionary {
*
* @return Dictionary
*/
public static synchronized void initial(Configuration cfg) {
public static synchronized Dictionary initial(Configuration cfg) {
if (singleton == null) {
synchronized (Dictionary.class) {
if (singleton == null) {
@ -166,57 +166,14 @@ public class Dictionary {
}
}
return singleton;
}
}
}
return singleton;
}
private void walkFileTree(List<String> files, Path path) {
if (Files.isRegularFile(path)) {
files.add(path.toString());
} else if (Files.isDirectory(path)) try {
Files.walkFileTree(path, new SimpleFileVisitor<Path>() {
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) {
files.add(file.toString());
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path file, IOException e) {
logger.error("[Ext Loading] listing files", e);
return FileVisitResult.CONTINUE;
}
});
} catch (IOException e) {
logger.error("[Ext Loading] listing files", e);
} else {
logger.warn("[Ext Loading] file not found: " + path);
}
}
private void loadDictFile(DictSegment dict, Path file, boolean critical, String name) {
try (InputStream is = new FileInputStream(file.toFile())) {
BufferedReader br = new BufferedReader(
new InputStreamReader(is, "UTF-8"), 512);
String word = br.readLine();
if (word != null) {
if (word.startsWith("\uFEFF"))
word = word.substring(1);
for (; word != null; word = br.readLine()) {
word = word.trim();
if (word.isEmpty()) continue;
dict.fillSegment(word.toCharArray());
}
}
} catch (FileNotFoundException e) {
logger.error("ik-analyzer: " + name + " not found", e);
if (critical) throw new RuntimeException("ik-analyzer: " + name + " not found!!!", e);
} catch (IOException e) {
logger.error("ik-analyzer: " + name + " loading failed", e);
}
}
private List<String> getExtDictionarys() {
public List<String> getExtDictionarys() {
List<String> extDictFiles = new ArrayList<String>(2);
String extDictCfg = getProperty(EXT_DICT);
if (extDictCfg != null) {
@ -224,8 +181,8 @@ public class Dictionary {
String[] filePaths = extDictCfg.split(";");
for (String filePath : filePaths) {
if (filePath != null && !"".equals(filePath.trim())) {
Path file = PathUtils.get(getDictRoot(), filePath.trim());
walkFileTree(extDictFiles, file);
Path file = PathUtils.get(filePath.trim());
extDictFiles.add(file.toString());
}
}
@ -233,7 +190,7 @@ public class Dictionary {
return extDictFiles;
}
private List<String> getRemoteExtDictionarys() {
public List<String> getRemoteExtDictionarys() {
List<String> remoteExtDictFiles = new ArrayList<String>(2);
String remoteExtDictCfg = getProperty(REMOTE_EXT_DICT);
if (remoteExtDictCfg != null) {
@ -249,7 +206,7 @@ public class Dictionary {
return remoteExtDictFiles;
}
private List<String> getExtStopWordDictionarys() {
public List<String> getExtStopWordDictionarys() {
List<String> extStopWordDictFiles = new ArrayList<String>(2);
String extStopWordDictCfg = getProperty(EXT_STOP);
if (extStopWordDictCfg != null) {
@ -257,8 +214,8 @@ public class Dictionary {
String[] filePaths = extStopWordDictCfg.split(";");
for (String filePath : filePaths) {
if (filePath != null && !"".equals(filePath.trim())) {
Path file = PathUtils.get(getDictRoot(), filePath.trim());
walkFileTree(extStopWordDictFiles, file);
Path file = PathUtils.get(filePath.trim());
extStopWordDictFiles.add(file.toString());
}
}
@ -266,7 +223,7 @@ public class Dictionary {
return extStopWordDictFiles;
}
private List<String> getRemoteExtStopWordDictionarys() {
public List<String> getRemoteExtStopWordDictionarys() {
List<String> remoteExtStopWordDictFiles = new ArrayList<String>(2);
String remoteExtStopWordDictCfg = getProperty(REMOTE_EXT_STOP);
if (remoteExtStopWordDictCfg != null) {
@ -282,7 +239,7 @@ public class Dictionary {
return remoteExtStopWordDictFiles;
}
private String getDictRoot() {
public String getDictRoot() {
return conf_dir.toAbsolutePath().toString();
}
@ -294,7 +251,7 @@ public class Dictionary {
*/
public static Dictionary getSingleton() {
if (singleton == null) {
throw new IllegalStateException("ik dict has not been initialized yet, please call initial method first.");
throw new IllegalStateException("词典尚未初始化请先调用initial方法");
}
return singleton;
}
@ -386,7 +343,37 @@ public class Dictionary {
// 读取主词典文件
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_MAIN);
loadDictFile(_MainDict, file, false, "Main Dict");
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error(e.getMessage(), e);
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord = null;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_MainDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
// 加载扩展词典
this.loadExtDict();
// 加载远程自定义词库
@ -400,11 +387,44 @@ public class Dictionary {
// 加载扩展词典配置
List<String> extDictFiles = getExtDictionarys();
if (extDictFiles != null) {
InputStream is = null;
for (String extDictName : extDictFiles) {
// 读取扩展词典文件
logger.info("[Dict Loading] " + extDictName);
Path file = PathUtils.get(extDictName);
loadDictFile(_MainDict, file, false, "Extra Dict");
Path file = PathUtils.get(getDictRoot(), extDictName);
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
// 如果找不到扩展的字典则忽略
if (is == null) {
continue;
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord = null;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
// 加载扩展词典数据到主内存词典中
_MainDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
}
}
}
@ -419,7 +439,7 @@ public class Dictionary {
List<String> lists = getRemoteWords(location);
// 如果找不到扩展的字典则忽略
if (lists == null) {
logger.error("[Dict Loading] " + location + " load failed");
logger.error("[Dict Loading] " + location + "加载失败");
continue;
}
for (String theWord : lists) {
@ -433,17 +453,10 @@ public class Dictionary {
}
private static List<String> getRemoteWords(String location) {
SpecialPermission.check();
return AccessController.doPrivileged((PrivilegedAction<List<String>>) () -> {
return getRemoteWordsUnprivileged(location);
});
}
/**
* 从远程服务器上下载自定义词条
*/
private static List<String> getRemoteWordsUnprivileged(String location) {
private static List<String> getRemoteWords(String location) {
List<String> buffer = new ArrayList<String>();
RequestConfig rc = RequestConfig.custom().setConnectionRequestTimeout(10 * 1000).setConnectTimeout(10 * 1000)
@ -459,30 +472,25 @@ public class Dictionary {
String charset = "UTF-8";
// 获取编码默认为utf-8
HttpEntity entity = response.getEntity();
if(entity!=null){
Header contentType = entity.getContentType();
if(contentType!=null&&contentType.getValue()!=null){
String typeValue = contentType.getValue();
if(typeValue!=null&&typeValue.contains("charset=")){
charset = typeValue.substring(typeValue.lastIndexOf("=") + 1);
}
}
if (entity.getContentLength() > 0 || entity.isChunked()) {
in = new BufferedReader(new InputStreamReader(entity.getContent(), charset));
String line;
while ((line = in.readLine()) != null) {
buffer.add(line);
}
in.close();
response.close();
return buffer;
}
}
if (response.getEntity().getContentType().getValue().contains("charset=")) {
String contentType = response.getEntity().getContentType().getValue();
charset = contentType.substring(contentType.lastIndexOf("=") + 1);
}
in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), charset));
String line;
while ((line = in.readLine()) != null) {
buffer.add(line);
}
in.close();
response.close();
return buffer;
}
response.close();
} catch (IllegalStateException | IOException e) {
} catch (ClientProtocolException e) {
logger.error("getRemoteWords {} error", e, location);
} catch (IllegalStateException e) {
logger.error("getRemoteWords {} error", e, location);
} catch (IOException e) {
logger.error("getRemoteWords {} error", e, location);
}
return buffer;
@ -497,17 +505,80 @@ public class Dictionary {
// 读取主词典文件
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_STOP);
loadDictFile(_StopWords, file, false, "Main Stopwords");
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error(e.getMessage(), e);
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord = null;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_StopWords.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
// 加载扩展停止词典
List<String> extStopWordDictFiles = getExtStopWordDictionarys();
if (extStopWordDictFiles != null) {
is = null;
for (String extStopWordDictName : extStopWordDictFiles) {
logger.info("[Dict Loading] " + extStopWordDictName);
// 读取扩展词典文件
file = PathUtils.get(extStopWordDictName);
loadDictFile(_StopWords, file, false, "Extra Stopwords");
file = PathUtils.get(getDictRoot(), extStopWordDictName);
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
// 如果找不到扩展的字典则忽略
if (is == null) {
continue;
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord = null;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
// 加载扩展停止词典数据到内存中
_StopWords.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
}
}
@ -518,7 +589,7 @@ public class Dictionary {
List<String> lists = getRemoteWords(location);
// 如果找不到扩展的字典则忽略
if (lists == null) {
logger.error("[Dict Loading] " + location + " load failed");
logger.error("[Dict Loading] " + location + "加载失败");
continue;
}
for (String theWord : lists) {
@ -540,29 +611,146 @@ public class Dictionary {
_QuantifierDict = new DictSegment((char) 0);
// 读取量词词典文件
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_QUANTIFIER);
loadDictFile(_QuantifierDict, file, false, "Quantifier");
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord = null;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_QuantifierDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException ioe) {
logger.error("Quantifier Dictionary loading exception.");
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
}
private void loadSurnameDict() {
DictSegment _SurnameDict = new DictSegment((char) 0);
_SurnameDict = new DictSegment((char) 0);
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_SURNAME);
loadDictFile(_SurnameDict, file, true, "Surname");
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
if (is == null) {
throw new RuntimeException("Surname Dictionary not found!!!");
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_SurnameDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
if (is != null) {
is.close();
is = null;
}
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
}
private void loadSuffixDict() {
DictSegment _SuffixDict = new DictSegment((char) 0);
_SuffixDict = new DictSegment((char) 0);
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_SUFFIX);
loadDictFile(_SuffixDict, file, true, "Suffix");
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
if (is == null) {
throw new RuntimeException("Suffix Dictionary not found!!!");
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_SuffixDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
is.close();
is = null;
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
}
private void loadPrepDict() {
DictSegment _PrepDict = new DictSegment((char) 0);
_PrepDict = new DictSegment((char) 0);
Path file = PathUtils.get(getDictRoot(), Dictionary.PATH_DIC_PREP);
loadDictFile(_PrepDict, file, true, "Preposition");
InputStream is = null;
try {
is = new FileInputStream(file.toFile());
} catch (FileNotFoundException e) {
logger.error("ik-analyzer", e);
}
if (is == null) {
throw new RuntimeException("Preposition Dictionary not found!!!");
}
try {
BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"), 512);
String theWord;
do {
theWord = br.readLine();
if (theWord != null && !"".equals(theWord.trim())) {
_PrepDict.fillSegment(theWord.trim().toCharArray());
}
} while (theWord != null);
} catch (IOException e) {
logger.error("ik-analyzer", e);
} finally {
try {
is.close();
is = null;
} catch (IOException e) {
logger.error("ik-analyzer", e);
}
}
}
void reLoadMainDict() {
logger.info("start to reload ik dict.");
public void reLoadMainDict() {
logger.info("重新加载词典...");
// 新开一个实例加载词典减少加载过程对当前词典使用的影响
Dictionary tmpDict = new Dictionary(configuration);
tmpDict.configuration = getSingleton().configuration;
@ -570,7 +758,7 @@ public class Dictionary {
tmpDict.loadStopWordDict();
_MainDict = tmpDict._MainDict;
_StopWords = tmpDict._StopWords;
logger.info("reload ik dict finished.");
logger.info("重新加载词典完毕...");
}
}

View File

@ -1,8 +1,6 @@
package org.wltea.analyzer.dic;
import java.io.IOException;
import java.security.AccessController;
import java.security.PrivilegedAction;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
@ -10,12 +8,11 @@ import org.apache.http.client.methods.HttpHead;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.logging.log4j.Logger;
import org.elasticsearch.SpecialPermission;
import org.wltea.analyzer.help.ESPluginLoggerFactory;
import org.elasticsearch.common.logging.ESLoggerFactory;
public class Monitor implements Runnable {
private static final Logger logger = ESPluginLoggerFactory.getLogger(Monitor.class.getName());
private static final Logger logger = ESLoggerFactory.getLogger(Monitor.class.getName());
private static CloseableHttpClient httpclient = HttpClients.createDefault();
/*
@ -37,15 +34,6 @@ public class Monitor implements Runnable {
this.last_modified = null;
this.eTags = null;
}
public void run() {
SpecialPermission.check();
AccessController.doPrivileged((PrivilegedAction<Void>) () -> {
this.runUnprivileged();
return null;
});
}
/**
* 监控流程
* 向词库服务器发送Head请求
@ -55,7 +43,7 @@ public class Monitor implements Runnable {
* 休眠1min返回第
*/
public void runUnprivileged() {
public void run() {
//超时设置
RequestConfig rc = RequestConfig.custom().setConnectionRequestTimeout(10*1000)

View File

@ -1,27 +0,0 @@
package org.wltea.analyzer.help;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.spi.ExtendedLogger;
public class ESPluginLoggerFactory {
private ESPluginLoggerFactory() {
}
static public Logger getLogger(String name) {
return getLogger("", LogManager.getLogger(name));
}
static public Logger getLogger(String prefix, String name) {
return getLogger(prefix, LogManager.getLogger(name));
}
static public Logger getLogger(String prefix, Class<?> clazz) {
return getLogger(prefix, LogManager.getLogger(clazz.getName()));
}
static public Logger getLogger(String prefix, Logger logger) {
return (Logger)(prefix != null && prefix.length() != 0 ? new PrefixPluginLogger((ExtendedLogger)logger, logger.getName(), prefix) : logger);
}
}

View File

@ -1,48 +0,0 @@
package org.wltea.analyzer.help;
import org.apache.logging.log4j.Level;
import org.apache.logging.log4j.Marker;
import org.apache.logging.log4j.MarkerManager;
import org.apache.logging.log4j.message.Message;
import org.apache.logging.log4j.message.MessageFactory;
import org.apache.logging.log4j.spi.ExtendedLogger;
import org.apache.logging.log4j.spi.ExtendedLoggerWrapper;
import java.util.WeakHashMap;
public class PrefixPluginLogger extends ExtendedLoggerWrapper {
private static final WeakHashMap<String, Marker> markers = new WeakHashMap();
private final Marker marker;
static int markersSize() {
return markers.size();
}
public String prefix() {
return this.marker.getName();
}
PrefixPluginLogger(ExtendedLogger logger, String name, String prefix) {
super(logger, name, (MessageFactory) null);
String actualPrefix = prefix == null ? "" : prefix;
WeakHashMap var6 = markers;
MarkerManager.Log4jMarker actualMarker;
synchronized (markers) {
MarkerManager.Log4jMarker maybeMarker = (MarkerManager.Log4jMarker) markers.get(actualPrefix);
if (maybeMarker == null) {
actualMarker = new MarkerManager.Log4jMarker(actualPrefix);
markers.put(new String(actualPrefix), actualMarker);
} else {
actualMarker = maybeMarker;
}
}
this.marker = (Marker) actualMarker;
}
public void logMessage(String fqcn, Level level, Marker marker, Message message, Throwable t) {
assert marker == null;
super.logMessage(fqcn, level, this.marker, message, t);
}
}

View File

@ -1,38 +1,36 @@
package org.wltea.analyzer.help;
import org.apache.logging.log4j.Logger;
import org.elasticsearch.common.logging.ESLoggerFactory;
public class Sleep {
private static final Logger logger = ESPluginLoggerFactory.getLogger(Sleep.class.getName());
private static final Logger logger = ESLoggerFactory.getLogger(Sleep.class.getName());
public enum Type {MSEC, SEC, MIN, HOUR}
;
public static void sleep(Type type, int num) {
try {
switch (type) {
case MSEC:
Thread.sleep(num);
return;
case SEC:
Thread.sleep(num * 1000);
return;
case MIN:
Thread.sleep(num * 60 * 1000);
return;
case HOUR:
Thread.sleep(num * 60 * 60 * 1000);
return;
default:
System.err.println("输入类型错误应为MSEC,SEC,MIN,HOUR之一");
return;
}
} catch (InterruptedException e) {
logger.error(e.getMessage(), e);
}
}
public enum Type{MSEC,SEC,MIN,HOUR};
public static void sleep(Type type,int num){
try {
switch(type){
case MSEC:
Thread.sleep(num);
return;
case SEC:
Thread.sleep(num*1000);
return;
case MIN:
Thread.sleep(num*60*1000);
return;
case HOUR:
Thread.sleep(num*60*60*1000);
return;
default:
System.err.println("输入类型错误应为MSEC,SEC,MIN,HOUR之一");
return;
}
} catch (InterruptedException e) {
logger.error(e.getMessage(), e);
}
}
}

View File

@ -38,6 +38,21 @@ version=${project.version}
#
# 'name': the plugin name
name=${elasticsearch.plugin.name}
### mandatory elements for site plugins:
#
# 'site': set to true to indicate contents of the _site/
# directory in the root of the plugin should be served.
site=${elasticsearch.plugin.site}
#
### mandatory elements for jvm plugins :
#
# 'jvm': true if the 'classname' class should be loaded
# from jar files in the root directory of the plugin.
# Note that only jar files in the root directory are
# added to the classpath for the plugin! If you need
# other resources, package them into a resources jar.
jvm=${elasticsearch.plugin.jvm}
#
# 'classname': the name of the class to load, fully-qualified.
classname=${elasticsearch.plugin.classname}
@ -54,3 +69,12 @@ java.version=${maven.compiler.target}
# is loaded so Elasticsearch will refuse to start in the presence of
# plugins with the incorrect elasticsearch.version.
elasticsearch.version=${elasticsearch.version}
#
### deprecated elements for jvm plugins :
#
# 'isolated': true if the plugin should have its own classloader.
# passing false is deprecated, and only intended to support plugins
# that have hard dependencies against each other. If this is
# not specified, then the plugin is isolated by default.
isolated=${elasticsearch.plugin.isolated}
#

View File

@ -1,4 +0,0 @@
grant {
// needed because of the hot reload functionality
permission java.net.SocketPermission "*", "connect,resolve";
};