Elasticsearch 聚合子文档字段值

Question

I'm facing the following problem of selecting and sorting parent documents based on an aggregated value over its children documents.我面临以下基于其子文档的聚合值选择和排序父文档的问题。 The aggregation (eg sum) itself depends on a query string, ie which children documents are relevant for the aggregation.聚合（例如 sum）本身取决于查询字符串，即哪些子文档与聚合相关。

Example: Given the documents basket A and basket B , for each basket document , I am looking to sum over the number field of its fruit children if the name field matches my query, eg apples .示例：给定文档购物篮 A和购物篮 B ，对于每个basket document ，如果name字段与我的查询匹配，我希望对其fruit子项的number字段求和，例如apples 。

PUT /baskets/_doc/0
{
  "name": "basket A", 
  "fruit": [
    {
      "name": "apples",
      "number": 2
    },
    {
      "name": "oranges",
      "number": 3
    }
  ]
}

PUT /baskets/_doc/1
{
  "name": "basket B",
  "fruit": [
    {
      "name": "apples",
      "number": 3
    },
    {
      "name": "apples",
      "number": 3
    }
  ]
}

Mappings:映射：

PUT /baskets
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "fruit": { 
        "type": "nested",
        "properties": {
          "name": { "type": "text" },
          "number": { "type": "long" }
        }
      }
    }
  }
}

Use case 1: Which basket has (strictly) more than 5 apples?用例 1：哪个篮子（严格）有超过 5 个苹果？ Would expect only basket B只期望篮子 B
Use case 2: Sort baskets by number of apples.用例 2：按苹果数量对篮子进行排序。 Would expect basket B with a total of 6 apples, then basket A with a total of 2 apples.预计篮子 B总共有 6 个苹果，然后篮子 A总共有 2 个苹果。

How can one implement this using the Elasticsearch (7.8.0) query DSL?如何使用 Elasticsearch (7.8.0) 查询 DSL 来实现这一点？

I have tried so far with nested queries and aggregations without success.到目前为止，我已经尝试过使用嵌套查询和聚合但没有成功。

Thanks!谢谢！

Edit: Added mappings编辑：添加映射

Edit: Updated the numbers to better reflect the problem编辑：更新了数字以更好地反映问题

*Edit: Added possible answer to Use case 2 (see comments to the answer from @joe): *编辑：为用例 2添加了可能的答案（请参阅@joe 对答案的评论）：

GET /profiles/_search
{
  "aggs": {
    "aggs_baskets": {
      "terms": {
        "field": "name",
        "order": {"nest > fruit_filter > fruit_sum": "desc"}
      },
      "aggs": {
        "nest":{
          "nested":{
            "path": "fruit"
          },
          "aggs":{
            "fruit_filter":{
              "filter": {
                "term": {"fruit.name": "apple"}
              },
              "aggs":{
                "fruit_sum":{
                  "sum": {"field": "fruit.number"}
                }
              }
            }
          }
        }
      }
    }
  }
}

Answer 1

Use case 1:用例 1：

GET baskets/_search
{
  "query": {
    "nested": {
      "path": "fruit",
      "inner_hits": {}, 
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "fruit.name": {
                  "value": "apples"
                }
              }
            },
            {
              "range": {
                "fruit.number": {
                  "gte": 5
                }
              }
            }
          ]
        }
      }
    }
  }
}

Strictly more than 5 --> gt ;严格超过 5 --> gt ; >=5 --> gte . >=5 --> gte 。

Also notice the inner_hits part -- this gives you the actual nested subdocument which caused this particular basket to match the query.还要注意inner_hits部分——这为您提供了实际的嵌套子文档，它导致这个特定的篮子与查询匹配。 It's not required but good-to-know.这不是必需的，但很好知道。

Use case 2:用例 2：

GET baskets/_search
{
  "sort": [
    {
      "fruit.number": {
        "nested_path": "fruit",
        "order": "desc"
      }
    }
  ]
}

Use case 2 Edit:用例 2 编辑：

There are probably cleaner ways of doing this but I'd go with the following:可能有更干净的方法可以做到这一点，但我会 go 使用以下内容：

GET baskets/_search
{
  "size": 0,
  "aggs": {
    "multiply_and_add": {
      "scripted_metric": {
        "params": {
          "only_fruit_name": "apples"
        },
        "init_script": "state.by_basket_name = [:]",
        "map_script": """
          def basket_name = params._source['name'];
          def fruits = params._source['fruit'].findAll(group -> group.name == params.only_fruit_name);
          
          for (def fruit_group : fruits) {
            def number = fruit_group.number;
            
            if (state.by_basket_name.containsKey(basket_name)) {
              state.by_basket_name[basket_name] += number;
            } else {
              state.by_basket_name[basket_name] = number;
            }
          }
        """,
        "combine_script": "return state.by_basket_name",
        "reduce_script": "return states"
      }
    }
  }
}

yielding a hash map along the lines of产生一个 hash map 沿线

{
  ...
  "aggregations":{
    "multiply_and_add":{
      "value":[
        {
          "basket A":2,
          "basket B":6
        }
      ]
    }
  }
}

Sorting can either be done in the reduce_script or within your ES response post-processing pipeline.排序可以在reduce_script或 ES 响应后处理管道中完成。 You could of course choose to go w/ (sorted) lists and lambdas ...您当然可以选择 go w/（排序）列表和lambdas ...

Notice the required nested_path .注意所需的nested_path 。

Answer 2

After a while of searching and testing, here are (in addition to @joe's answer to use case 2 ) possible queries for both use cases.经过一段时间的搜索和测试，这里（除了@joe 对用例 2的回答）可能对这两个用例进行查询。 Note that both use cases require to change the mapping for the field name to be of type keyword .请注意，这两个用例都需要将字段name的映射更改为keyword类型。

Use case 1 : Which basket has (strictly) more than 5 apples?用例 1 ：哪个篮子（严格）有超过 5 个苹果？ Would expect only basket B只期望篮子 B

For more information on filtering results by their aggregation value see Bucket Selectors有关按聚合值过滤结果的更多信息，请参阅桶选择器

GET /baskets/_search
{
  "aggs": {
    "aggs_baskets": {
      "terms": {
        "field": "name"
      },
      "aggs": {
        "nest":{
          "nested":{
            "path": "fruit"
          },
          "aggs":{
            "fruit_filter":{
              "filter": {
                "match": {"fruit.name": "apples"}
              },
              "aggs":{
                "fruit_sum":{
                  "sum": {"field": "fruit.number"}
                }
              }
            }
          }
        },
        "basket_sum_filter":{
          "bucket_selector":{
            "buckets_path":{
              "fruitSum":"nest > fruit_filter > fruit_sum"
            },
            "script":"params.fruitSum > 5"
          }
        }
      }
    }
  }
}

... yielding ... 屈服

...,

"buckets": [
    {
        "key": "basket B",
        "doc_count": 1,
        "nest": {
            "doc_count": 2,
            "fruit_filter": {
                "doc_count": 2,
                "fruit_sum": {
                    "value": 6
                }
            }
        }
    }
]

Use case 2 : Sort baskets by number of apples.用例 2 ：按苹果数量对篮子进行排序。 Would expect basket B with a total of 6 apples, then basket A with a total of 2 apples.预计篮子 B总共有 6 个苹果，然后篮子 A总共有 2 个苹果。

GET /baskets/_search
{
  "aggs": {
    "aggs_baskets": {
      "terms": {
        "field": "name",
        "order": {"nest > fruit_filter > fruit_sum": "desc"}
      },
      "aggs": {
        "nest":{
          "nested":{
            "path": "fruit"
          },
          "aggs":{
            "fruit_filter":{
              "filter": {
                "term": {"fruit.name": "apple"}
              },
              "aggs":{
                "fruit_sum":{
                  "sum": {"field": "fruit.number"}
                }
              }
            }
          }
        }
      }
    }
  }
}

... yielding ... 屈服

...,

"buckets": [
    {
        "key": "basket B",
        "doc_count": 1,
        "nest": {
            "doc_count": 2,
            "fruit_filter": {
                "doc_count": 2,
                "fruit_sum": {
                    "value": 6
                }
            }
        }
    },
    {
        "key": "basket A",
        "doc_count": 1,
        "nest": {
            "doc_count": 2,
            "fruit_filter": {
                "doc_count": 1,
                "fruit_sum": {
                    "value": 2
                }
            }
        }
    }
]

Elasticsearch 聚合子文档字段值

问题描述

2 个解决方案

解决方案1
0 2020-07-24 21:17:30

Use case 1:用例 1：

Use case 2:用例 2：

Use case 2 Edit:用例 2 编辑：

解决方案2
0 已采纳 2020-08-03 14:16:35

Elasticsearch 聚合子文档字段值

问题描述

2 个解决方案

解决方案1 0 2020-07-24 21:17:30

Use case 1:用例 1：

Use case 2:用例 2：

Use case 2 Edit:用例 2 编辑：

解决方案2 0 已采纳 2020-08-03 14:16:35

解决方案1
0 2020-07-24 21:17:30

解决方案2
0 已采纳 2020-08-03 14:16:35