简体   繁体   English

ElasticSearch聚合嵌套字段

[英]ElasticSearch aggregate nested fields

I have a product repository with mapping: 我有一个带有映射的产品存储库:

settings do
  mapping do
    indexes :name
    indexes :vendor_id,   type: 'integer'
    indexes :category_id, type: 'integer'

    indexes :spec_entries, type: 'nested' do
      indexes :spec_id,     type: 'integer'
      indexes :value_id,    type: 'integer'
      indexes :name,        index: 'no'
      indexes :description, index: 'no'
      indexes :value,       index: 'no'
    end
  end
end

Spec entries is a product specifications (ex: Fork: Air) where Fork is a name and Air is a value. 规格条目是产品规格(例如:货叉:空气),其中货叉是名称,航空是值。 Also there are specification ID, specification value ID, and specification description. 另外还有规格ID,规格值ID和规格说明。

I need to get a aggregations result like this: 我需要这样的聚合结果:

[
...
{
  id: 335,
  name: "Fork",
  description: "There are few common types of fork — elastomer, oil and air",
  count: 30,
  values: [{
    id: 645,
    name: "Elastomer",
    count: 17
  }, {
    id: 643,
    name: "Oil",
    count: 10
  }, {
    id: 649,
    name: "Air",
    count: 3
  }, ]
},
...
]

Specs and values should be ordered by count. 规格和值应按计数排序。

What type of aggregation I need to use? 我需要使用哪种类型的聚合?

You want to use the nested aggregation (because spec_entries has nested type and then one terms aggregations for the spec_entries.name and a top_hits sub-aggregation to get the top nested spec_entries . Something like this should do: 您想使用nested聚合(因为spec_entries具有nested类型,然后为spec_entries.nametop_hits子聚合使用一个terms聚合来获得顶部的嵌套spec_entries 。应执行以下操作:

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "spec_names": {
      "nested": {
        "path": "spec_entries"
      },
      "aggs": {
        "names": {
          "terms": {
            "field": "spec_entries.name"
          },
          "aggs": {
            "top_entries": {
              "top_hits": {
                "field": "spec_entries.value"
              }
            }
          }
        }
      }
    }
  }
}

And that would yield something very close to what you expect: 这将产生非常接近您期望的结果:

{
  ...
  "aggregations" : {
    "spec_names" : {
      "doc_count" : 1,
      "names" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [ {
          "key" : "fork",     <-------- The spec_entries name
          "doc_count" : 1,
          "top_values" : {
            "hits" : {
              "total" : 1,
              "max_score" : 1.0,
              "hits" : [ {
                "_index" : "tests",
                "_type" : "test1",
                "_id" : "1",
                "_nested" : {
                  "field" : "spec_entries",
                  "offset" : 0
                },
                "_score" : 1.0,
                "_source":{  <-------- For each name, the top spec_entries content (value, id, desc, etc)
                  "name":"Fork",
                  "value":"Air",
                  "description":"desc",
                  "spec_id":1,
                  "value_id":1
                }
              } ]
            }
          }
        } ]
      }
    }
  }
}

Note that using the top_hits aggregation as a sub-aggregation of the nested aggregation will only work from ES 1.5 onward 请注意,使用top_hits聚合作为nested聚合的子聚合只能从ES 1.5开始使用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM