简体   繁体   中英

Aggs multiple buckets with nested documents in Elasticsearch

I'm currently working on an Elasticsearch project. I want to aggregate data from our existing documents.

The (simplified) structure is as follows:

{
  "products" : {
    "mappings" : {
      "product" : {
        "properties" : {
          "created" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss"
          },
          "description" : {
            "type" : "text"
          },
          "facets" : {
            "type" : "nested",
            "properties" : {
              "facet_id" : {
                "type" : "long"
              }
              "name_slug" : {
                "type" : "keyword"
              },
              "value_slug" : {
                "type" : "keyword"
              }
            }
          },
       }
      }
    }
   }
}

Want I want to achieve with one query:

  1. Select the unique facet_name values

  2. Under the facet_names I want all corresponding facet_values

Something like this:

- facet_name
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
- facet_name
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)

Can you guys point me in the right direction? I've looked at the aggs query, but the documentation is not clearly enough in order to realise this.

You'll be using nested terms aggregations . Since the facet names & values are under the same path, you can try this:

GET products/_search
{
  "size": 0,
  "aggs": {
    "by_facet_names_parent": {
      "nested": {
        "path": "facets"
      },
      "aggs": {
        "by_facet_names_nested": {
          "terms": {
            "field": "facets.name_slug",
            "size": 10
          },
          "aggs": {
            "by_facet_subvalues": {
              "terms": {
                "field": "facets.value_slug",
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

And your response should look like something along these lines:

{
  "took": 26,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 30,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "by_facet_names_parent": {
      "doc_count": 90,
      "by_facet_names_nested": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 80,
        "buckets": [
          {
            "key": "0JDcya7Y7Y",     <-------- your facet name keyword
            "doc_count": 4,
            "by_facet_subvalues": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "3q4E9R6h5k",    <-------- one of the facet values + its count
                  "doc_count": 3
                },
                {
                  "key": "1q4E9R6h5k",   <-------- another facet value & count
                  "doc_count": 1
                }
              ]
            }
          },
          {
            "key": "0RyRKWugU1",
            "doc_count": 1,
            "by_facet_subvalues": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Af7qeCsXz6",
                  "doc_count": 1
                }
              ]
            }
          }
          .....
        ]
      }
    }
  }
}

Notice how the number of nested buckets might be >= the number of your actual products docs. This is because the nested aggregations treat the nested subdocuments as separate documents within the parent documents . This takes some time to digest but it'll make sense when you play around with them long enough.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM