简体   繁体   English

如何在Elastic搜索中创建和添加自定义分析器?

[英]How to create and add custom analyser in Elastic search?

i have a batch of "smartphones" products in my ES and I need to query them by using "smart phone" text. 我的ES中有一批“智能手机”产品,我需要使用“智能手机”文本进行查询。 So I m looking into the compound word token filter. 因此,我正在研究复合词令牌过滤器。 Specifically , I m planning to use a custom filter like this: 具体来说,我打算使用这样的自定义过滤器:

curl -XPUT 'localhost:9200/_all/_settings -d '
{
  "analysis" : {
    "analyzer":{
      "second":{
        "type":"custom",
        "tokenizer":"standard",
        "filter":["myFilter"]
      }
      "filter": {
        "myFilter" :{
             "type" : "dictionary_decompounder"
             "word_list": ["smart", "phone"]
             }
             }             
    }
}
}
'

Is this the correct approach ? 这是正确的方法吗? Also I d like to ask you how can i create and add the custom analyser to ES? 我也想问你如何创建定制分析器并将其添加到ES? I looked into several links but couldn't figure out how to do it. 我调查了几个链接,但不知道该怎么做。 I guess I m looking for the correct syntax. 我想我正在寻找正确的语法。 Thank you 谢谢

EDIT 编辑

I m running 1.4.5 version. 我正在运行1.4.5版本。 and I verified that the custom analyser was added successfully: 并且我验证了自定义分析器已成功添加:

{
  "test_index" : {
    "settings" : {
      "index" : {
        "creation_date" : "1453761455612",
        "analysis" : {
          "filter" : {
            "myFilter" : {
              "type" : "dictionary_decompounder",
              "word_list" : [ "smart", "phone" ]
            }
          },
          "analyzer" : {
            "second" : {
              "type" : "custom",
              "filter" : [ "lowercase", "myFilter" ],
              "tokenizer" : "standard"
            }
          }
        },
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "version" : {
          "created" : "1040599"
        },
        "uuid" : "xooKEdMBR260dnWYGN_ZQA"
      }
    }
  }
}

Your approach looks good, I would also consider adding lowercase token filter , so that even Smartphone (notice Uppercase 'S') will be split into smart and phone . 您的方法看起来不错,我也将考虑添加小写的令牌过滤器 ,这样即使是智能手机 (注意,大写的“ S”)也将被拆分为smartphone

Then You could create index with analyzer like this, 然后,您可以像这样使用分析器创建索引,

curl -XPUT 'localhost:9200/your_index -d '
{
  "settings": {
    "analysis": {
      "analyzer": {
        "second": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "myFilter"
          ]
        }
      },
      "filter": {
        "myFilter": {
          "type": "dictionary_decompounder",
          "word_list": [
            "smart",
            "phone"
          ]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "second"
        }
      }
    }
  }
}
'

Here you are creating index named your_index , custom analyzer named second and applied that to name field. 在这里,你正在创建一个名为your_index指数, custom analyzer命名为第二和应用,为name字段。

You can check if the analyzer is working as expected with analyze api like this 您可以使用以下分析api检查分析器是否按预期工作

curl -XGET 'localhost:9200/your_index/_analyze' -d '
{
  "analyzer" : "second",
  "text" : "LG Android smartphone"
}'

Hope this helps!! 希望这可以帮助!!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM