保留ElasticSearch查詢中的術語順序

Question

在ElasticSearch中是否可以形成一個保留條款順序的查詢？

一個簡單的例子是使用標准分析器索引這些文檔：

你知道搜索
你知道搜索
知道搜索你

我可以查詢+you +search ，這將返回我所有文件，包括第三個。

如果我只想檢索具有此特定順序條款的文檔，該怎么辦？ 我可以形成一個可以幫我的查詢嗎？

考慮到短語可以通過簡單引用文本： "you know" （檢索第一和第二個文檔），我覺得應該有一種方法來保留不相鄰的多個術語的順序。

在上面的簡單示例中，我可以使用鄰近搜索，但這不包括更復雜的情況。

Answer 1

您可以使用span_near查詢，它有一個in_order參數。

{
    "query": {
        "span_near": {
            "clauses": [
                {
                    "span_term": {
                        "field": "you"
                    }
                },
                {
                    "span_term": {
                        "field": "search"
                    }
                }
            ],
            "slop": 2,
            "in_order": true
        }
    }
}

Answer 2

短語匹配不能確保順序;-)。 如果你指定了足夠的斜率 - 例如2 - “hello world”將匹配“world hello”。 但這並不一定是壞事，因為如果兩個術語彼此“接近”並且與他們的順序無關，通常搜索會更相關。 我並不認為這個功能的作者會想到匹配1000個不同的單詞。

有一個解決方案，我可以找到保持順序，但不簡單：使用腳本。 這是一個例子：

POST /my_index/my_type/_bulk
{ "index": { "_id": 1 }}
{ "title": "hello world" }
{ "index": { "_id": 2 }}
{ "title": "world hello" }
{ "index": { "_id": 3 }}
{ "title": "hello term1 term2 term3 term4 world" }

POST my_index/_search
{
  "query": {
    "filtered": {
      "query": {
        "match": {
          "title": {
            "query": "hello world",
            "slop": 5,
            "type": "phrase"
          }
        }
      },
      "filter": {
        "script": {
          "script": "term1Pos=0;term2Pos=0;term1Info = _index['title'].get('hello',_POSITIONS);term2Info = _index['title'].get('world',_POSITIONS); for(pos in term1Info){term1Pos=pos.position;}; for(pos in term2Info){term2Pos=pos.position;}; return term1Pos<term2Pos;",
          "params": {}
        }
      }
    }
  }
}

為了使腳本本身更具可讀性，我在這里用縮進重寫：

term1Pos = 0;
term2Pos = 0;
term1Info = _index['title'].get('hello',_POSITIONS);
term2Info = _index['title'].get('world',_POSITIONS);
for(pos in term1Info) {
  term1Pos = pos.position;
}; 
for(pos in term2Info) {
  term2Pos = pos.position;
}; 
return term1Pos < term2Pos;

上面是一個搜索“hello world”的搜索，其中一個slop為5，在上面的文檔中將匹配所有這些。 但腳本過濾器將確保單詞“hello”中文檔中的位置低於單詞“world”中文檔中的位置。 通過這種方式，無論我們在查詢中設置了多少slops，這些位置是一個接一個的事實確保了訂單。

這是文檔中的部分，它闡述了上面腳本中使用的內容。

Answer 3

這正是match_phrase查詢（參見此處）的作用。

它會在存在的基礎上檢查條款的位置。

例如，這些文件：

POST test/values
{
  "test": "Hello World"
}

POST test/values
{
  "test": "Hello nice World"
}

POST test/values
{
  "test": "World, I don't say hello"
}

將基本match查詢找到所有內容：

POST test/_search
{
  "query": {
    "match": {
      "test": "Hello World"
    }
  }
}

但是使用match_phrase ，只會返回第一個文檔：

POST test/_search
{
  "query": {
    "match_phrase": {
      "test": "Hello World"
    }
  }
}

{
   ...
   "hits": {
      "total": 1,
      "max_score": 2.3953633,
      "hits": [
         {
            "_index": "test",
            "_type": "values",
            "_id": "qFZAKYOTQh2AuqplLQdHcA",
            "_score": 2.3953633,
            "_source": {
               "test": "Hello World"
            }
         }
      ]
   }
}

在您的情況下，您希望接受在您的條款之間保持一定距離 。 這可以通過slop參數來實現，該參數表示您允許您的術語彼此之間的距離：

POST test/_search
{
  "query": {
    "match": {
      "test": {
        "query": "Hello world",
        "slop":1,
        "type": "phrase"
      }
    }
  }
}

在最后一個請求中，您還可以找到第二個文檔：

{
   ...
   "hits": {
      "total": 2,
      "max_score": 0.38356602,
      "hits": [
         {
            "_index": "test",
            "_type": "values",
            "_id": "7mhBJgm5QaO2_aXOrTB_BA",
            "_score": 0.38356602,
            "_source": {
               "test": "Hello World"
            }
         },
         {
            "_index": "test",
            "_type": "values",
            "_id": "VKdUJSZFQNCFrxKk_hWz4A",
            "_score": 0.2169777,
            "_source": {
               "test": "Hello nice World"
            }
         }
      ]
   }
}

您可以在權威指南中找到關於此用例的整章。

保留ElasticSearch查詢中的術語順序

問題描述

3 個解決方案

解決方案1
10 已采納 2014-10-29 17:34:28

解決方案2
6 2014-10-29 17:08:52

解決方案3
2 2014-10-29 15:55:30

保留ElasticSearch查詢中的術語順序

問題描述

3 個解決方案

解決方案1 10 已采納 2014-10-29 17:34:28

解決方案2 6 2014-10-29 17:08:52

解決方案3 2 2014-10-29 15:55:30

解決方案1
10 已采納 2014-10-29 17:34:28

解決方案2
6 2014-10-29 17:08:52

解決方案3
2 2014-10-29 15:55:30