[英]Elasticsearch: Querying nested objects
尊敬的Elasticsearch專家,
我在查詢嵌套對象時遇到問題。 讓我們使用以下簡化的映射:
{
"mappings" : {
"_doc" : {
"properties" : {
"companies" : {
"type": "nested",
"properties" : {
"company_id": { "type": "long" },
"name": { "type": "text" }
}
},
"title": { "type": "text" }
}
}
}
}
並將一些文檔放在索引中:
PUT my_index/_doc/1
{
"title" : "CPU release",
"companies" : [
{ "company_id" : 1, "name" : "AMD" },
{ "company_id" : 2, "name" : "Intel" }
]
}
PUT my_index/_doc/2
{
"title" : "GPU release 2018-01-10",
"companies" : [
{ "company_id" : 1, "name" : "AMD" },
{ "company_id" : 3, "name" : "Nvidia" }
]
}
PUT my_index/_doc/3
{
"title" : "GPU release 2018-03-01",
"companies" : [
{ "company_id" : 3, "name" : "Nvidia" }
]
}
PUT my_index/_doc/4
{
"title" : "Chipset release",
"companies" : [
{ "company_id" : 2, "name" : "Intel" }
]
}
現在我想執行這樣的查詢:
{
"query": {
"bool": {
"must": [
{ "match": { "title": "GPU" } },
{ "nested": {
"path": "companies",
"query": {
"bool": {
"must": [
{ "match": { "companies.name": "AMD" } }
]
}
},
"inner_hits" : {}
}
}
]
}
}
}
結果,我想獲得具有匹配文件數量的匹配公司。 所以上面的查詢應該給我:
[
{ "company_id" : 1, "name" : "AMD", "matched_documents:": 1 }
]
以下查詢:
{
"query": {
"bool": {
"must": [
{ "match": { "title": "GPU" } }
{ "nested": {
"path": "companies",
"query": { "match_all": {} },
"inner_hits" : {}
}
}
]
}
}
}
應該給我所有分配給文檔的公司,該公司的標題包含“ GPU”以及匹配的文檔數:
[
{ "company_id" : 1, "name" : "AMD", "matched_documents:": 1 },
{ "company_id" : 3, "name" : "Nvidia", "matched_documents:": 2 }
]
是否有可能具有良好的性能來達到此結果? 我顯然對匹配的文檔不感興趣,僅對匹配的文檔和嵌套對象的數量不感興趣。
謝謝你的幫助。
關於Elasticsearch,您需要做的是:
title
使用GPU
或在companies
列表中提及Nvidia
); company_id
); 數組中的每個nested
對象都被索引為一個單獨的隱藏文檔 ,這使生活變得有些復雜。 讓我們看看如何匯總它們。
nested
文檔呢? 您可以結合使用nested , term和top_hits聚合來實現此目的 :
POST my_index/doc/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "GPU"
}
},
{
"nested": {
"path": "companies",
"query": {
"match_all": {}
}
}
}
]
}
},
"aggs": {
"Extract nested": {
"nested": {
"path": "companies"
},
"aggs": {
"By company id": {
"terms": {
"field": "companies.company_id"
},
"aggs": {
"Examples of such company_id": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}
這將給出以下輸出:
{
...
"hits": { ... },
"aggregations": {
"Extract nested": {
"doc_count": 4, <== How many "nested" documents there were?
"By company id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3, <== this bucket's key: "company_id": 3
"doc_count": 2, <== how many "nested" documents there were with such company_id?
"Examples of such company_id": {
"hits": {
"total": 2,
"max_score": 1.5897496,
"hits": [ <== an example, "top hit" for such company_id
{
"_nested": {
"field": "companies",
"offset": 1
},
"_score": 1.5897496,
"_source": {
"company_id": 3,
"name": "Nvidia"
}
}
]
}
}
},
{
"key": 1,
"doc_count": 1,
"Examples of such company_id": {
"hits": {
"total": 1,
"max_score": 1.5897496,
"hits": [
{
"_nested": {
"field": "companies",
"offset": 0
},
"_score": 1.5897496,
"_source": {
"company_id": 1,
"name": "AMD"
}
}
]
}
}
}
]
}
}
}
}
注意,對於Nvidia
我們有"doc_count": 2
。
但是,如果我們要計算擁有Nvidia
與Intel
的“父”對象的數量呢?
nested
存儲桶計算父對象怎么辦? 可以使用reverse_nested
聚合來實現。
我們只需要稍微更改一下查詢:
POST my_index/doc/_search
{
"query": { ... },
"aggs": {
"Extract nested": {
"nested": {
"path": "companies"
},
"aggs": {
"By company id": {
"terms": {
"field": "companies.company_id"
},
"aggs": {
"Examples of such company_id": {
"top_hits": {
"size": 1
}
},
"original doc count": { <== we ask ES to count how many there are parent docs
"reverse_nested": {}
}
}
}
}
}
}
}
結果將如下所示:
{
...
"hits": { ... },
"aggregations": {
"Extract nested": {
"doc_count": 3,
"By company id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 2,
"original doc count": {
"doc_count": 2 <== how many "parent" documents have such company_id
},
"Examples of such company_id": {
"hits": {
"total": 2,
"max_score": 1.5897496,
"hits": [
{
"_nested": {
"field": "companies",
"offset": 1
},
"_score": 1.5897496,
"_source": {
"company_id": 3,
"name": "Nvidia"
}
}
]
}
}
},
{
"key": 1,
"doc_count": 1,
"original doc count": {
"doc_count": 1
},
"Examples of such company_id": {
"hits": {
"total": 1,
"max_score": 1.5897496,
"hits": [
{
"_nested": {
"field": "companies",
"offset": 0
},
"_score": 1.5897496,
"_source": {
"company_id": 1,
"name": "AMD"
}
}
]
}
}
}
]
}
}
}
}
為了使區別變得明顯,讓我們稍微更改一下數據,然后在文檔列表中添加另一個Nvidia
項目:
PUT my_index/doc/2
{
"title" : "GPU release 2018-01-10",
"companies" : [
{ "company_id" : 1, "name" : "AMD" },
{ "company_id" : 3, "name" : "Nvidia" },
{ "company_id" : 3, "name" : "Nvidia" }
]
}
最后一個查詢(帶有reverse_nested
查詢)將為我們提供以下內容:
"By company id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 3, <== 3 "nested" documents with Nvidia
"original doc count": {
"doc_count": 2 <== but only 2 "parent" documents
},
"Examples of such company_id": {
"hits": {
"total": 3,
"max_score": 1.5897496,
"hits": [
{
"_nested": {
"field": "companies",
"offset": 2
},
"_score": 1.5897496,
"_source": {
"company_id": 3,
"name": "Nvidia"
}
}
]
}
}
},
如您所見,這是一個難以理解的細微差別,但它完全改變了語義。
雖然在大多數情況下, nested
查詢和聚合的性能應該足夠,但是當然要付出一定的代價。 因此,建議在調整搜索速度時避免使用nested
或父子類型。
在Elasticsearch中,盡管沒有單一的配方,但通常可以通過反規范化來獲得最佳性能,您應該根據需要選擇數據模型。
希望這可以為您澄清一下nested
東西!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.