简体   繁体   English

弹性搜索中的通配符搜索或部分匹配

[英]Wilcard search or partial matching in Elastic search

I am trying to provide the search to end user with type as they go which is is more like sqlserver.我正在尝试向最终用户提供搜索时输入的类型,这更像是 sqlserver。 I was able to implement ES query for the given sql scenario:我能够为给定的 sql 场景实现 ES 查询:

select * from table where name like '%pete%' and type != 'xyz and type!='abc'

But the ES query doesnt work for this sql query但是 ES 查询不适用于这个 sql 查询

select * from table where name like '%peter tom%' and type != 'xyz and type!='abc'

In my elastic search alongwith the wildcard query i also need to perform some boolean filtered query在我的弹性搜索以及通配符查询中,我还需要执行一些布尔过滤查询

{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "should": [
            {
              "query": {
                "wildcard": {
                  "name": { "value": "*pete*" }
                }
              }
            }
          ],
          "must_not": [
            {
              "match": { "type": "xyz" }
            },
            {
              "match": { "type": "abc" }
            }
          ]
        }
      }
    }
  }
}

The above elastic query with wildcard search works fine and gets me all the documents that matches pete and are not of type xyz and abc .But when i try perform the wildcard with 2 seprate words seprated by space then the same query returns me empty as shown below.For example上面带有通配符搜索的弹性查询工作正常,并为我获取所有与 pete 匹配且不属于 xyz 和 abc 类型的文档。下面。例如

{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "should": [
            {
              "query": {
                "wildcard": {
                  "name": { "value": "*peter tom*" }
                }
              }
            }
          ],
          "must_not": [
            {
              "match": { "type": "xyz" }
            },
            {
              "match": { "type": "abc" }
            }
          ]
        }
      }
    }
  }
}

My mapping is as follows :我的映射如下:

{
  "properties": {
    "name": {
      "type": "string"
    },
    "type": {
      "type": "string"
    }
  }
}

What query should i use in order to make wild card search possible for words seprated by spaces我应该使用什么查询,以便对由空格​​分隔的单词进行通配符搜索

The most efficient solution involves leveraging an ngram tokenizer in order to tokenize portions of your name field.最有效的解决方案是利用ngram 分词器来分词您name字段的部分内容。 For instance, if you have a name like peter tomson , the ngram tokenizer will tokenize and index it like this:例如,如果你有一个像peter tomson这样的名字,ngram 分词器会像这样对它进行分词和索引:

  • pe聚乙烯
  • pet宠物
  • pete皮特
  • peter彼得
  • peter t彼得
  • peter to彼得到
  • peter tom彼得汤姆
  • peter toms彼得汤姆斯
  • peter tomso彼得托姆索
  • eter tomson汤姆逊
  • ter tomson汤姆逊
  • er tomson汤姆逊
  • r tomson汤姆逊
  • tomson汤臣
  • tomson汤臣
  • omson欧姆生
  • mson米森
  • son儿子
  • on

So, when this has been indexed, searching for any of those tokens will retrieve your document with peter thomson in it.因此,当它被编入索引后,搜索这些标记中的任何一个都将检索您的文档,其中包含peter thomson

Let's create the index:让我们创建索引:

PUT likequery
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_ngram_analyzer": {
          "tokenizer": "my_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "my_ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "2",
          "max_gram": "15"
        }
      }
    }
  },
  "mappings": {
    "typename": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "search": {
              "type": "string",
              "analyzer": "my_ngram_analyzer"
            }
          }
        },
        "type": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

You'll then be able to search like this with a simple and very efficient term query:然后,您将能够使用简单且非常有效的term查询进行这样的搜索:

POST likequery/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "name.search": "peter tom"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "type": "xyz"
          }
        },
        {
          "match": {
            "type": "abc"
          }
        }
      ]
    }
  }
}

Well my solution is not perfect and I am not sure about performance.好吧,我的解决方案并不完美,我不确定性能。 So you should try it on your own risk :)所以你应该自担风险尝试它:)

This is es 5 version这是es 5版本

PUT likequery
{
  "mappings": {
    "typename": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "keyword"
            }
          }
        },
        "type": {
          "type": "string"
        }
      }
    }
  }
}

in ES 2.1 change "type": "keyword" to "type": "string", "index": "not_analyzed"在 ES 2.1 中将"type": "keyword"更改为"type": "string", "index": "not_analyzed"

PUT likequery/typename/1
{
  "name": "peter tomson"
}

PUT likequery/typename/2
{
  "name": "igor tkachenko"
}

PUT likequery/typename/3
{
  "name": "taras shevchenko"
}

Query is case sensetive查询区分大小写

POST likequery/_search
{
  "query": {
    "regexp": {
      "name.raw": ".*taras shev.*"
    }
  }
}

Response回复

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "likequery",
        "_type": "typename",
        "_id": "3",
        "_score": 1,
        "fields": {
          "raw": [
            "taras shevchenko"
          ]
        }
      }
    ]
  }
}

PS.附注。 Once again I am not sure about performance of this query since it will use scan and not index.我再次不确定此查询的性能,因为它将使用扫描而不是索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM