ElasticSearch 僅返回文檔的特定部分

Question

我有一個模仿以下結構的 JSON 文檔。

{
"mydata": [
      {
        "Key1": "Hello",
        "Key2": "this",
        "Key3": "is",
        "Key4": "line one",
        "Key5": "of the file"
      },
      {
        "Key1": "Hello",
        "Key2": "this",
        "Key3": "is",
        "Key4": "line two",
        "Key5": "of the file"
      }]
}

我使用的索引沒有任何特定的映射。 我可以編寫自由文本 Lucene 查詢，例如

mydata.Key4:"line one"

結果返回整個文檔。 但是，就我而言，我只想檢索 JSON object 的第一部分作為結果。 有沒有辦法做到這一點？

{
        "Key1": "Hello",
        "Key2": "this",
        "Key3": "is",
        "Key4": "line one",
        "Key5": "of the file"
}

我發現我可以使用_source_includes檢索特定字段並傳遞所需的鍵，但是，我無法找到一個等效項來返回與查詢匹配的 JSON 文檔的特定部分中的所有鍵。 是因為文件的索引方式嗎？ 有人可以在這里指導我嗎？

編輯：

我刪除了索引並更新了映射如下

{
"mappings" : {
     
  "properties" : {
   "data" : {
    "type" : "nested"
   }
  }
 }
}

我重新索引了文檔，快速瀏覽了 ES 文檔並運行了以下嵌套查詢。

{
"_source": false,
  "query": {
       "nested": {
          "path": "data",
          "query": {
          "match": { 
               "data.Key4": "line one" 
          }
       },
       "inner_hits": {} 
  }
 }
}

但是，這也會返回我索引中的所有文檔，只是現在返回的結果在inner_hits下

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.52889514,
        "hits": [{
            "_index": "myindex",
            "_type": "_doc",
            "_id": "QAZJ-nMBi6fwNevjDQJy",
            "_score": 0.52889514,
            "inner_hits": {
                "data": {
                    "hits": {
                        "total": {
                            "value": 2,
                            "relation": "eq"
                        },
                        "max_score": 0.87546873,
                        "hits": [{
                            "_index": "myindex",
                            "_type": "_doc",
                            "_id": "QAZJ-nMBi6fwNevjDQJy",
                            "_nested": {
                                "field": "data",
                                "offset": 0
                            },
                            "_score": 0.87546873,
                            "_source": {
                                "Key1": "Hello",
                                "Key2": "this",
                                "Key3": "is",
                                "Key4": "line one",
                                "Key5": "of the file"
                            }
                        }, {
                            "_index": "myindex",
                            "_type": "_doc",
                            "_id": "QAZJ-nMBi6fwNevjDQJy",
                            "_nested": {
                                "field": "data",
                                "offset": 1
                            },
                            "_score": 0.18232156,
                            "_source": {
                                "Key1": "Hello",
                                "Key2": "this",
                                "Key3": "is",
                                "Key4": "line two",
                                "Key5": "of the file"
                            }
                        }]
                    }
                }
            }
        }]
    }
}

我在這里錯過了什么嗎？

Answer 1

由於您沒有定義mapping ，因此這是主要問題。 當您按照您提到的方式保存數據時，它將作為text類型的單獨屬性保存。

當您執行搜索時，它將帶來整個文檔。 但是，如果您為mydata定義nested映射，那么您可以使用inner_hits來僅檢索匹配的文檔。

編輯：

要使用的查詢：

{
  "_source": false,
  "query": {
    "nested": {
      "path": "data",
      "inner_hits": {        
      },
      "query": {
        "bool": {
          "must": [
            {
              "term": { //To look for exact match
                "data.Key4.keyword": "line one" //need to match line one not line two
              }
            }
          ]
        }
      }
    }
  }
}

使用匹配時會發生什么：

line one行將被標記如下

{
    "tokens": [
        {
            "token": "line",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "one",
            "start_offset": 5,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

同樣，它創建兩個標記line ， two 。

因此，當您使用match時，它是全文搜索查詢。 它確實分析了索引時間和搜索時間。 因此，在搜索期間，將分析line one並且 ES 查找line或one 。 line two包含標記line ，因此這也是結果的一部分。

為了避免這種情況，你必須避免分析。 所以必須使用term queries 。 它尋找完全匹配。

ElasticSearch 僅返回文檔的特定部分

問題描述

1 個解決方案

解決方案1
1 已采納 2020-08-13 09:27:43

ElasticSearch 僅返回文檔的特定部分

問題描述

1 個解決方案

解決方案1 1 已采納 2020-08-13 09:27:43

解決方案1
1 已采納 2020-08-13 09:27:43