[英]How to create and add custom analyser in Elastic search?
i have a batch of "smartphones" products in my ES and I need to query them by using "smart phone" text. 我的ES中有一批“智能手机”产品,我需要使用“智能手机”文本进行查询。 So I m looking into the compound word token filter. 因此,我正在研究复合词令牌过滤器。 Specifically , I m planning to use a custom filter like this: 具体来说,我打算使用这样的自定义过滤器:
curl -XPUT 'localhost:9200/_all/_settings -d '
{
"analysis" : {
"analyzer":{
"second":{
"type":"custom",
"tokenizer":"standard",
"filter":["myFilter"]
}
"filter": {
"myFilter" :{
"type" : "dictionary_decompounder"
"word_list": ["smart", "phone"]
}
}
}
}
}
'
Is this the correct approach ? 这是正确的方法吗? Also I d like to ask you how can i create and add the custom analyser to ES? 我也想问你如何创建定制分析器并将其添加到ES? I looked into several links but couldn't figure out how to do it. 我调查了几个链接,但不知道该怎么做。 I guess I m looking for the correct syntax. 我想我正在寻找正确的语法。 Thank you 谢谢
EDIT 编辑
I m running 1.4.5 version. 我正在运行1.4.5版本。 and I verified that the custom analyser was added successfully: 并且我验证了自定义分析器已成功添加:
{
"test_index" : {
"settings" : {
"index" : {
"creation_date" : "1453761455612",
"analysis" : {
"filter" : {
"myFilter" : {
"type" : "dictionary_decompounder",
"word_list" : [ "smart", "phone" ]
}
},
"analyzer" : {
"second" : {
"type" : "custom",
"filter" : [ "lowercase", "myFilter" ],
"tokenizer" : "standard"
}
}
},
"number_of_shards" : "5",
"number_of_replicas" : "1",
"version" : {
"created" : "1040599"
},
"uuid" : "xooKEdMBR260dnWYGN_ZQA"
}
}
}
}
Your approach looks good, I would also consider adding lowercase token filter , so that even Smartphone (notice Uppercase 'S') will be split into smart and phone . 您的方法看起来不错,我也将考虑添加小写的令牌过滤器 ,这样即使是智能手机 (注意,大写的“ S”)也将被拆分为smart和phone 。
Then You could create index with analyzer like this, 然后,您可以像这样使用分析器创建索引,
curl -XPUT 'localhost:9200/your_index -d '
{
"settings": {
"analysis": {
"analyzer": {
"second": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"myFilter"
]
}
},
"filter": {
"myFilter": {
"type": "dictionary_decompounder",
"word_list": [
"smart",
"phone"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "second"
}
}
}
}
}
'
Here you are creating index named your_index , custom analyzer
named second and applied that to name field. 在这里,你正在创建一个名为your_index指数, custom analyzer
命名为第二和应用,为name字段。
You can check if the analyzer is working as expected with analyze api like this 您可以使用以下分析api检查分析器是否按预期工作
curl -XGET 'localhost:9200/your_index/_analyze' -d '
{
"analyzer" : "second",
"text" : "LG Android smartphone"
}'
Hope this helps!! 希望这可以帮助!!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.