简体   繁体   English

在ElasticSearch中查找完全匹配的词组

[英]Find exact match phrase in ElasticSearch

So I have the following ElasticSearch query: 所以我有以下ElasticSearch查询:

"query": {
"bool": {
  "must": [
    {
      "nested": {
        "path": "specs",
        "query": {
          "bool": {
            "must": [
              {
                "match": {
                  "specs.battery": "2 hours"
                }
              }
            ],
            "minimum_should_match": 1
          }
        }
      }
    },
    {
      "terms": {
        "category_ids": [
          16405
        ]
      }
    }
  ]
}
}

At the moment it returns all documents that have either 2 or hours in specs.battery value. 目前,它返回specs.battery值中specs.battery 22 hours所有文档。 How could I modify this query, so that it only returns documents, that have exact phrase 2 hours in specs.battery field? 如何修改此查询,使其仅返回在specs.battery字段中具有准确短语2 hoursspecs.battery As well, I would like to have the ability to have multiple phrases (2hrs, 2hours, 3 hours etc etc). 同样,我希望能够使用多个短语(2小时,2小时,3小时等)。 Is this achievable? 这可以实现吗?

The data in elasticsearch is by default tokenized when you index it. 默认情况下,对弹性搜索中的数据建立索引时会对其进行标记。 This means the result of indexing the expression "2 hours" will be 2 tokens mapped to the same document. 这意味着索引表达式“ 2 hours”的结果将是2个标记映射到同一文档。 However there will not be a one token "2 hours", therefore it will either search 2 or hours or even will not find it if you use a filtered query. 但是,不会有一个令牌“ 2 hours”,因此如果您使用过滤查询,它将搜索2或hour,甚至找不到。

To have Elasticseach consider "2 hours" as one expression you need to define specs.battery as not_analyzedin your mapping like follows: 为了让Elasticseach将“ 2 hours”作为一个表达式,您需要将specs.battery定义为not_analyzedin,您的映射如下所示:

curl -XPOST localhost:9200/your_index -d '{
    "mappings" : {
        "your_index_type" : {
            "properties" : {
                ...
                "battery" : { "type" : "string", "index":"not_analyzed" }
                ...
            }
        }
    }
}'

Then you can have an exact match using a filtered query as follows: 然后,您可以使用过滤的查询来进行完全匹配,如下所示:

curl -XGET 'http://localhost:9200/_all/_search?pretty=true' -d '
{
    "query": {
        "filtered" : {
            "filter" : {        
                "term": {
                    "battery": "2 hours"
        }
       }
     }
    }
}'

Then you'll have an exact match. 然后,您将完全匹配。

More details at: https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_exact_values.html 有关更多详细信息,请访问: https : //www.elastic.co/guide/zh-CN/elasticsearch/guide/current/_finding_exact_values.html

If on the other hand you absolutely need your field to be analyzed or work with an existing index that you can't change, you still have a solution by using the operator "and" like follows: 另一方面,如果您绝对需要分析您的字段或使用无法更改的现有索引,则仍然可以使用运算符“ and”来解决,如下所示:

curl -XGET localhost:9200/your_index'  -d '
{
    "query": {
        "match": {
           "battery": {
            "query": "2 hours",
            "operator": "and"
        }
    }
  }
}'

In the last option, you may have understood already that if you have a document that has "2 hours and something else" , the document will still be matched so this is not as precise as with an "not_analyzed" field. 在最后一个选项中,您可能已经了解,如果您的文档包含“ 2小时及其他内容”,则该文档仍将匹配,因此其准确性不如“ not_analyzed”字段。

More details on the last topic at: 有关上一个主题的更多详细信息,位于:

https://www.elastic.co/guide/en/elasticsearch/guide/current/match-multi-word.html https://www.elastic.co/guide/en/elasticsearch/guide/current/match-multi-word.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM