简体   繁体   English

Elasticsearch。 按词组过滤\\搜索

[英]Elasticsearch. Filtering\search by part of phrase

I need construct elastic request to search by part of phrase (it must be lowercase search by sequence of words). 我需要构造弹性请求以按词组的一部分进行搜索(必须按单词序列进行小写搜索)。

For example, record field contains: 例如,记录字段包含:

Lorem ipsum dolor sit amet, eam et gubergren vulputate

And I need to find this record in next ways (using next search terms): 我需要以其他方式(使用下一个搜索字词)找到该记录:

Lorem ipsum
Lorem     ipsum dolor
lorem, ipsum.dolor
dolor sit amet

Before I used a strict search. 在我进行严格搜索之前。 My solution was to create a custom analyzer ( Tokenizer = "keyword" and Filter = ["lowercase"] ), add its to field and set field index analyzed while mapping is executing. 我的解决方案是创建一个自定义分析器( Tokenizer = "keyword" and Filter = ["lowercase"] ),将其添加到字段并设置在执行映射时分析的字段索引。 But now the task changed. 但是现在任务改变了。

Can anybody help me how to create request? 有人可以帮我如何创建请求吗? I will be glad even any API elastic reference. 我将很高兴甚至任何API弹性参考。

Check out the _analyze API . 查看_analyze API

By using the noted custom analyzer ( lowercase keyword ), you are creating a single, large token: 通过使用标注的自定义分析器( lowercase keyword ),您可以创建一个大型令牌:

$ curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&filters=lowercase&text=Lorem+ipsum+dolor+sit+amet,+eam+et+gubergren+vulputate'
{
   "tokens": [
      {
         "token": "lorem ipsum dolor sit amet, eam et gubergren vulputate",
         "start_offset": 0,
         "end_offset": 54,
         "type": "word",
         "position": 1
      }
   ]
}

The only way to find that token is to search for exactly the same (post-analysis if it's being used) token. 找到该令牌的唯一方法是搜索完全相同的令牌(如果正在使用,则为后分析)。

However, if you did not use a custom analyzer at all, then you would get these tokens: 但是,如果您根本不使用自定义分析器,则将获得以下令牌:

$ curl -XGET 'localhost:9200/_analyze?text=Lorem+ipsum+dolor+sit+amet,+eam+et+gubergren+vulputate'
{
   "tokens": [
      {
         "token": "lorem",
         "start_offset": 0,
         "end_offset": 5,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "ipsum",
         "start_offset": 6,
         "end_offset": 11,
         "type": "<ALPHANUM>",
         "position": 2
      },
      {
         "token": "dolor",
         "start_offset": 12,
         "end_offset": 17,
         "type": "<ALPHANUM>",
         "position": 3
      },
      {
         "token": "sit",
         "start_offset": 18,
         "end_offset": 21,
         "type": "<ALPHANUM>",
         "position": 4
      },
      {
         "token": "amet",
         "start_offset": 22,
         "end_offset": 26,
         "type": "<ALPHANUM>",
         "position": 5
      },
      {
         "token": "eam",
         "start_offset": 28,
         "end_offset": 31,
         "type": "<ALPHANUM>",
         "position": 6
      },
      {
         "token": "et",
         "start_offset": 32,
         "end_offset": 34,
         "type": "<ALPHANUM>",
         "position": 7
      },
      {
         "token": "gubergren",
         "start_offset": 35,
         "end_offset": 44,
         "type": "<ALPHANUM>",
         "position": 8
      },
      {
         "token": "vulputate",
         "start_offset": 45,
         "end_offset": 54,
         "type": "<ALPHANUM>",
         "position": 9
      }
   ]
}

Now you can search for any word in the "sentence" and find matches, including using the phrase search . 现在,您可以在“句子”中搜索任何单词并找到匹配项,包括使用短语search

Thinking of it more simply though, you want to search with a match query to get the benefits of full text search because it will use the same analyzer on the search terms . 不过,更简单地说,您想使用match查询进行搜索以获得全文搜索的好处,因为它将在搜索词上使用相同的分析器。 If you use a term query (or filter), then it will only look at the exact tokens. 如果使用term查询(或过滤器),则它将仅查看确切的标记。

So, without using any custom analyzer at all, then you should be able to use those searches as-is to find the text: 因此,完全不需要使用任何自定义分析器,那么您应该能够按原样使用这些搜索来查找文本:

$ curl -XPOST 'localhost:9200/test/type' -d '{
  "field" : "Lorem ipsum dolor sit amet, eam et gubergren vulputate"
}'

By using a plain match query : 通过使用普通match查询

$ curl -XGET 'localhost:9200/test/_search' -d '{
  "query" : {
    "match" : {
      "field" : "lorem, ipsum.dolor"
    }
  }
}'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM