簡體   English   中英

Elasticsearch:查詢嵌套對象

[英]Elasticsearch: Querying nested objects

尊敬的Elasticsearch專家,
我在查詢嵌套對象時遇到問題。 讓我們使用以下簡化的映射:

{
  "mappings" : {
    "_doc" : {
      "properties" : {
        "companies" : {
          "type": "nested",
          "properties" : {
            "company_id": { "type": "long" },
            "name": { "type": "text" }
          }
        },
        "title": { "type": "text" }
      }
    }
  }
}

並將一些文檔放在索引中:

PUT my_index/_doc/1
{
  "title" : "CPU release",
  "companies" : [
    { "company_id" : 1, "name" :  "AMD" },
    { "company_id" : 2, "name" :  "Intel" }
  ]
}

PUT my_index/_doc/2
{
  "title" : "GPU release 2018-01-10",
  "companies" : [
    { "company_id" : 1, "name" :  "AMD" },
    { "company_id" : 3, "name" :  "Nvidia" }
  ]
}

PUT my_index/_doc/3
{
  "title" : "GPU release 2018-03-01",
  "companies" : [
    { "company_id" : 3, "name" :  "Nvidia" }
  ]
}

PUT my_index/_doc/4
{
  "title" : "Chipset release",
  "companies" : [
    { "company_id" : 2, "name" :  "Intel" }
  ]
}

現在我想執行這樣的查詢:

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "GPU" } },
        { "nested": {
            "path": "companies",
            "query": {
              "bool": {
                "must": [
                  { "match": { "companies.name": "AMD" } }
                ]
              }
            },
            "inner_hits" : {}
          }
        }
      ]
    }
  }
}

結果,我想獲得具有匹配文件數量的匹配公司。 所以上面的查詢應該給我:

[
  { "company_id" : 1, "name" : "AMD", "matched_documents:": 1 }
]

以下查詢:

{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "GPU" } }
        { "nested": {
            "path": "companies",
            "query": { "match_all": {} },
            "inner_hits" : {}
          }
        }
      ]
    }
  }
}

應該給我所有分配給文檔的公司,該公司的標題包含“ GPU”以及匹配的文檔數:

[
  { "company_id" : 1, "name" : "AMD", "matched_documents:": 1 },
  { "company_id" : 3, "name" : "Nvidia", "matched_documents:": 2 }
]

是否有可能具有良好的性能來達到此結果? 我顯然對匹配的文檔不感興趣,僅對匹配的文檔和嵌套對象的數量不感興趣。

謝謝你的幫助。

關於Elasticsearch,您需要做的是:

  1. 根據所需條件過濾“父”文檔(例如,在title使用GPU或在companies列表中提及Nvidia );
  2. 按照一定的標准將“嵌套”文檔分組,例如存儲桶 (例如company_id );
  3. 計算每個存儲桶中有多少“嵌套”文檔。

數組中的每個nested對象都被索引為一個單獨的隱藏文檔 ,這使生活變得有些復雜。 讓我們看看如何匯總它們。

那么如何匯總和計算nested文檔呢?

您可以結合使用nestedtermtop_hits聚合來實現此目的

POST my_index/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "GPU"
          }
        },
        {
          "nested": {
            "path": "companies",
            "query": {
              "match_all": {}
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "Extract nested": {
      "nested": {
        "path": "companies"
      },
      "aggs": {
        "By company id": {
          "terms": {
            "field": "companies.company_id"
          },
          "aggs": {
            "Examples of such company_id": {
              "top_hits": {
                "size": 1
              }
            }
          }
        }
      }
    }
  }
}

這將給出以下輸出:

{
  ...
  "hits": { ... },
  "aggregations": {
    "Extract nested": {
      "doc_count": 4, <== How many "nested" documents there were?
      "By company id": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 3,  <== this bucket's key: "company_id": 3
            "doc_count": 2, <== how many "nested" documents there were with such company_id?
            "Examples of such company_id": {
              "hits": {
                "total": 2,
                "max_score": 1.5897496,
                "hits": [  <== an example, "top hit" for such company_id
                  {
                    "_nested": {
                      "field": "companies",
                      "offset": 1
                    },
                    "_score": 1.5897496,
                    "_source": {
                      "company_id": 3,
                      "name": "Nvidia"
                    }
                  }
                ]
              }
            }
          },
          {
            "key": 1,
            "doc_count": 1,
            "Examples of such company_id": {
              "hits": {
                "total": 1,
                "max_score": 1.5897496,
                "hits": [
                  {
                    "_nested": {
                      "field": "companies",
                      "offset": 0
                    },
                    "_score": 1.5897496,
                    "_source": {
                      "company_id": 1,
                      "name": "AMD"
                    }
                  }
                ]
              }
            }
          }
        ]
      }
    }
  }
}

注意,對於Nvidia我們有"doc_count": 2

但是,如果我們要計算擁有NvidiaIntel的“父”對象的數量呢?

如果我們想基於nested存儲桶計算父對象怎么辦?

可以使用reverse_nested聚合來實現。

我們只需要稍微更改一下查詢:

POST my_index/doc/_search
{
  "query": { ... },
  "aggs": {
    "Extract nested": {
      "nested": {
        "path": "companies"
      },
      "aggs": {
        "By company id": {
          "terms": {
            "field": "companies.company_id"
          },
          "aggs": {
            "Examples of such company_id": {
              "top_hits": {
                "size": 1
              }
            },
            "original doc count": { <== we ask ES to count how many there are parent docs
              "reverse_nested": {}
            }
          }
        }
      }
    }
  }
}

結果將如下所示:

{
  ...
  "hits": { ... },
  "aggregations": {
    "Extract nested": {
      "doc_count": 3,
      "By company id": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 3,
            "doc_count": 2,
            "original doc count": {
              "doc_count": 2  <== how many "parent" documents have such company_id
            },
            "Examples of such company_id": {
              "hits": {
                "total": 2,
                "max_score": 1.5897496,
                "hits": [
                  {
                    "_nested": {
                      "field": "companies",
                      "offset": 1
                    },
                    "_score": 1.5897496,
                    "_source": {
                      "company_id": 3,
                      "name": "Nvidia"
                    }
                  }
                ]
              }
            }
          },
          {
            "key": 1,
            "doc_count": 1,
            "original doc count": {
              "doc_count": 1
            },
            "Examples of such company_id": {
              "hits": {
                "total": 1,
                "max_score": 1.5897496,
                "hits": [
                  {
                    "_nested": {
                      "field": "companies",
                      "offset": 0
                    },
                    "_score": 1.5897496,
                    "_source": {
                      "company_id": 1,
                      "name": "AMD"
                    }
                  }
                ]
              }
            }
          }
        ]
      }
    }
  }
}

我如何發現差異?

為了使區別變得明顯,讓我們稍微更改一下數據,然后在文檔列表中添加另一個Nvidia項目:

PUT my_index/doc/2
{
  "title" : "GPU release 2018-01-10",
  "companies" : [
    { "company_id" : 1, "name" :  "AMD" },
    { "company_id" : 3, "name" :  "Nvidia" },
    { "company_id" : 3, "name" :  "Nvidia" }
  ]
}

最后一個查詢(帶有reverse_nested查詢)將為我們提供以下內容:

  "By company id": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
      {
        "key": 3,
        "doc_count": 3,    <== 3 "nested" documents with Nvidia
        "original doc count": {
          "doc_count": 2   <== but only 2 "parent" documents
        },
        "Examples of such company_id": {
          "hits": {
            "total": 3,
            "max_score": 1.5897496,
            "hits": [
              {
                "_nested": {
                  "field": "companies",
                  "offset": 2
                },
                "_score": 1.5897496,
                "_source": {
                  "company_id": 3,
                  "name": "Nvidia"
                }
              }
            ]
          }
        }
      },

如您所見,這是一個難以理解的細微差別,但它完全改變了語義。

表現如何?

雖然在大多數情況下, nested查詢和聚合的性能應該足夠,但是當然要付出一定的代價。 因此,建議在調整搜索速度時避免使用nested或父子類型。

在Elasticsearch中,盡管沒有單一的配方,但通常可以通過反規范化來獲得最佳性能,您應該根據需要選擇數據模型。

希望這可以為您澄清一下nested東西!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM