在 Elasticsearch 中使用嵌套文档聚合多个存储桶

Question

我目前正在做一个 Elasticsearch 项目。 我想从我们现有的文档中汇总数据。

（简化的）结构如下：

{
  "products" : {
    "mappings" : {
      "product" : {
        "properties" : {
          "created" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss"
          },
          "description" : {
            "type" : "text"
          },
          "facets" : {
            "type" : "nested",
            "properties" : {
              "facet_id" : {
                "type" : "long"
              }
              "name_slug" : {
                "type" : "keyword"
              },
              "value_slug" : {
                "type" : "keyword"
              }
            }
          },
       }
      }
    }
   }
}

希望我想通过一个查询来实现：

选择唯一的 facet_name 值
在 facet_names 我想要所有相应的 facet_values

像这样的东西：

- facet_name
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
- facet_name
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)

你们能指出我正确的方向吗？ 我查看了 aggs 查询，但文档不够清楚，无法实现这一点。

Answer 1

您将使用嵌套术语聚合。 由于构面名称和值位于同一路径下，您可以尝试以下操作：

GET products/_search
{
  "size": 0,
  "aggs": {
    "by_facet_names_parent": {
      "nested": {
        "path": "facets"
      },
      "aggs": {
        "by_facet_names_nested": {
          "terms": {
            "field": "facets.name_slug",
            "size": 10
          },
          "aggs": {
            "by_facet_subvalues": {
              "terms": {
                "field": "facets.value_slug",
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

您的回复应该类似于以下内容：

{
  "took": 26,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 30,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "by_facet_names_parent": {
      "doc_count": 90,
      "by_facet_names_nested": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 80,
        "buckets": [
          {
            "key": "0JDcya7Y7Y",     <-------- your facet name keyword
            "doc_count": 4,
            "by_facet_subvalues": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "3q4E9R6h5k",    <-------- one of the facet values + its count
                  "doc_count": 3
                },
                {
                  "key": "1q4E9R6h5k",   <-------- another facet value & count
                  "doc_count": 1
                }
              ]
            }
          },
          {
            "key": "0RyRKWugU1",
            "doc_count": 1,
            "by_facet_subvalues": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Af7qeCsXz6",
                  "doc_count": 1
                }
              ]
            }
          }
          .....
        ]
      }
    }
  }
}

请注意嵌套存储桶的数量如何 >= 实际产品文档的数量。 这是因为嵌套聚合将嵌套的子文档视为父文档中的单独文档。 这需要一些时间来消化，但是当你和它们玩得足够长时，它就会变得有意义。

在 Elasticsearch 中使用嵌套文档聚合多个存储桶

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-13 14:43:13

在 Elasticsearch 中使用嵌套文档聚合多个存储桶

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-13 14:43:13

解决方案1
1 已采纳 2020-02-13 14:43:13