简体   繁体   English

ElasticSearch 中的模糊搜索不适用于空格

[英]Fuzzy search in ElasticSearch doesn't work with spaces

I'm using the fuzzy search option in ElasticSearch.我在 ElasticSearch 中使用模糊搜索选项。 It's pretty cool.它太酷了。

But I came across an issue when doing search for values that have spaces.但是我在搜索有空格的值时遇到了一个问题。 For example say I have two values:例如说我有两个值:

"Pizza"
"Pineapple Pizza"

and I search for Pizza using this query:我使用以下查询搜索 Pizza:

        client.search({
            index: 'food_index',
            body: {
                query: {
                    fuzzy: {
                        name: {
                            value: "Pizza",
                            transpositions: true,
                        }
                    },
                }
            }
        })

The values returned are:返回的值为:

"Pizza"
"Pineapple Pizza"

Which is expected.这是预期的。 But if I enter in the value "Pineapple Pizza" in my query:但是,如果我在查询中输入值“Pineapple Pizza”:

        client.search({
            index: 'food_index',
            body: {
                query: {
                    fuzzy: {
                        name: {
                            value: "Pineapple Pizza",
                            transpositions: true,
                        }
                    },
                }
            }
        })

The values returned are:返回的值为:

""

Empty空的

Why is that?这是为什么? It should be an exact match.它应该是完全匹配的。 I'm contemplating switching all names that have spaces in them to underscores.我正在考虑将所有包含空格的名称切换为下划线。 So "Pineapple Pizza" would be "Pineapple_Pizza" (This solution works for me).所以“Pineapple Pizza”将是“Pineapple_Pizza”(这个解决方案对我有用)。 But I'm asking this question as to hopefully finding a better alternative.但我问这个问题是希望找到一个更好的选择。 What am I doing wrong here?我在这里做错了什么?

Fuzzy queries are term level queries.模糊查询是术语级别的查询。 It means searched text is not analyzed before matching the documents.这意味着在匹配文档之前不分析搜索到的文本。 In your case standard analyzer is used on field name, which splits "Pineapple Pizza" in two tokens Pineapple and pizza.在您的情况下,标准分析器用于字段名称,它将“Pineapple Pizza”分成两个标记 Pineapple 和 Pizza。 Fuzzy query is trying to match search text "Pineapple pizza" to any similar term in index and there is no entry in index for the whole word pineapple pizza(it is broken in two words.)模糊查询正在尝试将搜索文本“Pineapple Pizza”与索引中的任何类似术语匹配,并且整个单词 pineapple Pizza 的索引中没有条目(它被分成两个单词。)

You need to use match query with fuzziness set to analyze query string您需要使用带有模糊性的匹配查询来分析查询字符串

{
  "query": {
        "match" : {
            "item" : {
                "query" : "Pineappl piz",
                "fuzziness": "auto"
            }
        }
    }
}

Response:回复:

 [
      {
        "_index" : "index27",
        "_type" : "_doc",
        "_id" : "p9qQDG4BLLIhDvFGnTMX",
        "_score" : 0.53372335,
        "_source" : {
          "item" : "Pineapple Pizza"
        }
      }
    ]

You can also use fuzziness on keyword field which stores entire text in index您还可以在将整个文本存储在索引中的关键字字段上使用模糊性

{
  "query": {
    "fuzzy": {
      "item.keyword": {
        "value":"Pineapple pizz"
      }
    }
  }
}

EDIT1:编辑1:

{
  "query": {
        "match" : {
            "item" : {
                "query" : "Pineapple pizza",
                "operator": "and",
                "fuzziness": "auto"
            }
        }
    }
}

"operator": "and" --> all the tokens in query must be present in document. "operator": "and" --> 查询中的所有标记都必须存在于文档中。 Default is OR, if any one token is present document is present.默认为 OR,如果存在任何一个标记,则存在文档。 There are other possible combinations where you can define how many tokens should match in percent term还有其他可能的组合,您可以在其中定义应以百分比形式匹配的令牌数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM