简体   繁体   English

弹性搜索部分匹配但严格的短语匹配

[英]Elastic search partial match but strict phrase matching

I'm looking for a way to fuzzy partial match against a field where the words match, however I want to also add in strict phrase matching.我正在寻找一种对单词匹配的字段进行模糊部分匹配的方法,但是我还想添加严格的短语匹配。

ie say I have fields such as即说我有诸如

foo bar
bar foo

I would like to achieve the following search behaviour:我想实现以下搜索行为:

  • If I search foo , I would like to return back both results.如果我搜索foo ,我想返回两个结果。

  • If I search ba , I would like to return back both results.如果我搜索ba ,我想返回两个结果。

  • If I search bar foo , I would like to only return back one result.如果我搜索bar foo ,我只想返回一个结果。

  • If I search bar foo foo , I don't want to return any results.如果我搜索bar foo foo ,我不想返回任何结果。

I would also like to add in single character fuzziness matching, so if a foo is mistyped as fbo then it would return back both results.我还想添加单字符模糊匹配,所以如果foo被错误输入为fbo ,那么它会返回两个结果。

My current search and index analyzer uses an edge_gram tokenizer and is working fairly well, except if any gram matches, it will return the results regardless if the following words match.我当前的搜索和索引分析器使用edge_gram标记器并且工作得相当好,除非有任何 gram 匹配,它会返回结果,无论以下单词是否匹配。 ie my search would return the back the following result for the search bar foo buzz即我的搜索将返回搜索bar foo buzz的以下结果

foo bar
bar foo

My tokenzier:我的代币:

ngram_tokenizer: {
   type: "edge_ngram",
   min_gram: "2",
   max_gram: "15",
   token_chars: ['letter', 'digit', 'punctuation', 'symbol'],
},
          

My analyzer:我的分析仪:

nGram_analyzer: {
  filter: [
  lowercase,
    "asciifolding"
  ],
  type: "custom",
  tokenizer: "ngram_tokenizer"
},

My field mapping:我的字段映射:


type: "search_as_you_type",
doc_values: false,
max_shingle_size: 3,
analyzer: "nGram_analyzer"
          

One way to achieve all your requirements is to use span_near query实现所有要求的一种方法是使用span_near查询

Span near query are much longer, but these are suitable for doing phrase match along with fuzziness parameter Span Near 查询要长得多,但这些适合与模糊参数一起进行短语匹配

Adding a working example with index data, search queries and search results添加带有索引数据、搜索查询和搜索结果的工作示例

Index Mapping:索引映射:

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      }
    }
  }
}

Index Data:指数数据:

{
    "title":"bar foo"
}
{
    "title":"foo bar"
}

Search Queries:搜索查询:

If I search foo , I would like to return back both results.如果我搜索foo ,我想返回两个结果。

{
  "query": {
    "bool": {
      "must": [
        {
          "span_near": {
            "clauses": [
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "title": {
                        "value": "foo",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              }
            ],
            "slop": 0,
            "in_order": true
          }
        }
      ]
    }
  }
}

Search Result:搜索结果:

"hits": [
      {
        "_index": "67205552",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.18232156,
        "_source": {
          "title": "bar foo"
        }
      },
      {
        "_index": "67205552",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.18232156,
        "_source": {
          "title": "foo bar"
        }
      }
    ]

If I search ba , I would like to return back both results.如果我搜索ba ,我想返回两个结果。

{
  "query": {
    "bool": {
      "must": [
        {
          "span_near": {
            "clauses": [
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "title": {
                        "value": "ba",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              }
            ],
            "slop": 0,
            "in_order": true
          }
        }
      ]
    }
  }
}

Search Result:搜索结果:

"hits": [
      {
        "_index": "67205552",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.18232156,
        "_source": {
          "title": "bar foo"
        }
      },
      {
        "_index": "67205552",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.18232156,
        "_source": {
          "title": "foo bar"
        }
      }
    ]

If I search bar foo foo , I don't want to return any results.如果我搜索bar foo foo ,我不想返回任何结果。

{
  "query": {
    "bool": {
      "must": [
        {
          "span_near": {
            "clauses": [
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "title": {
                        "value": "bar",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              },
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "title": {
                        "value": "foo",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              },
              {
                "span_multi": {
                  "match": {
                    "fuzzy": {
                      "title": {
                        "value": "foo",
                        "fuzziness": 2
                      }
                    }
                  }
                }
              }
            ],
            "slop": 0,
            "in_order": true
          }
        }
      ]
    }
  }
}

Search Result will be empty搜索结果将为空

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM