简体   繁体   English

ElasticSearch 不一致的通配符搜索

[英]ElasticSearch inconsistent wildcard search

I have a strange issue with my wildcard search.我的通配符搜索有一个奇怪的问题。 I've created an index with the following mapping:我创建了一个具有以下映射的索引: 在此处输入图像描述

I have the following document there:我在那里有以下文件:

在此处输入图像描述

When I'm performing the following query, I'm getting the document:当我执行以下查询时,我正在获取文档:

{
  "query": {
    "wildcard" : { "email" :  "*asdasd*"  }
  },
  "size": "10",
  "from": 0
}

But when I'm doing the next request, I'm not getting anything:但是当我做下一个请求时,我什么也没得到:

{
  "query": {
    "wildcard" : { "email" :  "*one-v*"  }
  },
  "size": "10",
  "from": 0
}

Can you please explain the reason for it?你能解释一下原因吗? Thank you谢谢

Elasticsearch uses a standard analyzer if no analyzer is specified.如果未指定分析仪,Elasticsearch 使用标准分析仪。 Assuming that the email field is of text type, so "asdasd@one-v.co.il" will get tokenized into假设email字段是text类型,因此"asdasd@one-v.co.il"将被标记为

{
  "tokens": [
    {
      "token": "asdasd",
      "start_offset": 0,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "one",
      "start_offset": 7,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "v.co.il",
      "start_offset": 11,
      "end_offset": 18,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

Now, when you are doing a wildcard query on the email field, then it will search for the tokens, created above.现在,当您对email字段进行通配符查询时,它将搜索上面创建的令牌。 Since there is no token that matches one-v , you are getting empty results for the second query.由于没有与one-v匹配的标记,因此您将获得第二个查询的空结果。

It is better to use a keyword field for wildcard queries.通配符查询最好使用keyword字段。 If you have not explicitly defined any index mapping then you need to add .keyword to the email field.如果您没有明确定义任何索引映射,那么您需要将.keyword添加到email字段。 This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after the email field).这使用关键字分析器而不是标准分析器(注意email字段后的“.keyword”)。

Modify your query as shown below修改您的查询,如下所示

{
  "query": {
    "wildcard": {
      "email.keyword": "*one-v*"
    }
  }
}

Search Result will be搜索结果将是

"hits": [
      {
        "_index": "67688032",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "email": "asdasd@one-v.co.il"
        }
      }
    ]

Otherwise you need to change the data type of the email field from text to keyword type否则需要将 email 字段的数据类型从text更改为keyword类型

This has to do with how text fields are saved .这与文本字段的保存方式有关。 By default standard analyzer is used.默认情况下使用标准分析器。

This is an example from the documentation which fits your case too:这是文档中的一个示例,也适合您的情况:

The text "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."文字“The 2 QUICK Brown-Foxes jumped over the lazy dog's bone”。 is broken into terms [ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ].分为[the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone]。

As you can see Brown-foxes is not a single token.如您所见,棕狐不是一个单一的标志。 The same will go for one-v, it will break into one and v.对于 one-v,go 也一样,它会分解为 one 和 v。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM