简体   繁体   English

Elasticsearch 聚合子文档字段值

[英]Elasticsearch aggregation over children document field values

I'm facing the following problem of selecting and sorting parent documents based on an aggregated value over its children documents.我面临以下基于其子文档的聚合值选择和排序父文档的问题。 The aggregation (eg sum) itself depends on a query string, ie which children documents are relevant for the aggregation.聚合(例如 sum)本身取决于查询字符串,即哪些子文档与聚合相关。

Example: Given the documents basket A and basket B , for each basket document , I am looking to sum over the number field of its fruit children if the name field matches my query, eg apples .示例:给定文档购物篮 A购物篮 B ,对于每个basket document ,如果name字段与我的查询匹配,我希望对其fruit子项的number字段求和,例如apples

PUT /baskets/_doc/0
{
  "name": "basket A", 
  "fruit": [
    {
      "name": "apples",
      "number": 2
    },
    {
      "name": "oranges",
      "number": 3
    }
  ]
}

PUT /baskets/_doc/1
{
  "name": "basket B",
  "fruit": [
    {
      "name": "apples",
      "number": 3
    },
    {
      "name": "apples",
      "number": 3
    }
  ]
}

Mappings:映射:

PUT /baskets
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "fruit": { 
        "type": "nested",
        "properties": {
          "name": { "type": "text" },
          "number": { "type": "long" }
        }
      }
    }
  }
}
  • Use case 1: Which basket has (strictly) more than 5 apples?用例 1:哪个篮子(严格)有超过 5 个苹果? Would expect only basket B只期望篮子 B
  • Use case 2: Sort baskets by number of apples.用例 2:按苹果数量对篮子进行排序。 Would expect basket B with a total of 6 apples, then basket A with a total of 2 apples.预计篮子 B总共有 6 个苹果,然后篮子 A总共有 2 个苹果。

How can one implement this using the Elasticsearch (7.8.0) query DSL?如何使用 Elasticsearch (7.8.0) 查询 DSL 来实现这一点?

I have tried so far with nested queries and aggregations without success.到目前为止,我已经尝试过使用嵌套查询和聚合但没有成功。

Thanks!谢谢!

Edit: Added mappings编辑:添加映射

Edit: Updated the numbers to better reflect the problem编辑:更新了数字以更好地反映问题

*Edit: Added possible answer to Use case 2 (see comments to the answer from @joe): *编辑:为用例 2添加了可能的答案(请参阅@joe 对答案的评论):

GET /profiles/_search
{
  "aggs": {
    "aggs_baskets": {
      "terms": {
        "field": "name",
        "order": {"nest > fruit_filter > fruit_sum": "desc"}
      },
      "aggs": {
        "nest":{
          "nested":{
            "path": "fruit"
          },
          "aggs":{
            "fruit_filter":{
              "filter": {
                "term": {"fruit.name": "apple"}
              },
              "aggs":{
                "fruit_sum":{
                  "sum": {"field": "fruit.number"}
                }
              }
            }
          }
        }
      }
    }
  }
}

Use case 1:用例 1:

GET baskets/_search
{
  "query": {
    "nested": {
      "path": "fruit",
      "inner_hits": {}, 
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "fruit.name": {
                  "value": "apples"
                }
              }
            },
            {
              "range": {
                "fruit.number": {
                  "gte": 5
                }
              }
            }
          ]
        }
      }
    }
  }
}

Strictly more than 5 --> gt ;严格超过 5 --> gt ; >=5 --> gte . >=5 --> gte

Also notice the inner_hits part -- this gives you the actual nested subdocument which caused this particular basket to match the query.还要注意inner_hits部分——这为您提供了实际的嵌套子文档,它导致这个特定的篮子与查询匹配。 It's not required but good-to-know.这不是必需的,但很好知道。

Use case 2:用例 2:

GET baskets/_search
{
  "sort": [
    {
      "fruit.number": {
        "nested_path": "fruit",
        "order": "desc"
      }
    }
  ]
}

Use case 2 Edit:用例 2 编辑:

There are probably cleaner ways of doing this but I'd go with the following:可能有更干净的方法可以做到这一点,但我会 go 使用以下内容:

GET baskets/_search
{
  "size": 0,
  "aggs": {
    "multiply_and_add": {
      "scripted_metric": {
        "params": {
          "only_fruit_name": "apples"
        },
        "init_script": "state.by_basket_name = [:]",
        "map_script": """
          def basket_name = params._source['name'];
          def fruits = params._source['fruit'].findAll(group -> group.name == params.only_fruit_name);
          
          for (def fruit_group : fruits) {
            def number = fruit_group.number;
            
            if (state.by_basket_name.containsKey(basket_name)) {
              state.by_basket_name[basket_name] += number;
            } else {
              state.by_basket_name[basket_name] = number;
            }
          }
        """,
        "combine_script": "return state.by_basket_name",
        "reduce_script": "return states"
      }
    }
  }
}

yielding a hash map along the lines of产生一个 hash map 沿线

{
  ...
  "aggregations":{
    "multiply_and_add":{
      "value":[
        {
          "basket A":2,
          "basket B":6
        }
      ]
    }
  }
}

Sorting can either be done in the reduce_script or within your ES response post-processing pipeline.排序可以在reduce_script或 ES 响应后处理管道中完成。 You could of course choose to go w/ (sorted) lists and lambdas ...您当然可以选择 go w/(排序)列表和lambdas ...

Notice the required nested_path .注意所需的nested_path

After a while of searching and testing, here are (in addition to @joe's answer to use case 2 ) possible queries for both use cases.经过一段时间的搜索和测试,这里(除了@joe 对用例 2的回答)可能对这两个用例进行查询。 Note that both use cases require to change the mapping for the field name to be of type keyword .请注意,这两个用例都需要将字段name的映射更改为keyword类型。

Use case 1 : Which basket has (strictly) more than 5 apples?用例 1 :哪个篮子(严格)有超过 5 个苹果? Would expect only basket B只期望篮子 B

For more information on filtering results by their aggregation value see Bucket Selectors有关按聚合值过滤结果的更多信息,请参阅桶选择器

GET /baskets/_search
{
  "aggs": {
    "aggs_baskets": {
      "terms": {
        "field": "name"
      },
      "aggs": {
        "nest":{
          "nested":{
            "path": "fruit"
          },
          "aggs":{
            "fruit_filter":{
              "filter": {
                "match": {"fruit.name": "apples"}
              },
              "aggs":{
                "fruit_sum":{
                  "sum": {"field": "fruit.number"}
                }
              }
            }
          }
        },
        "basket_sum_filter":{
          "bucket_selector":{
            "buckets_path":{
              "fruitSum":"nest > fruit_filter > fruit_sum"
            },
            "script":"params.fruitSum > 5"
          }
        }
      }
    }
  }
}

... yielding ... 屈服

...,

"buckets": [
    {
        "key": "basket B",
        "doc_count": 1,
        "nest": {
            "doc_count": 2,
            "fruit_filter": {
                "doc_count": 2,
                "fruit_sum": {
                    "value": 6
                }
            }
        }
    }
]

Use case 2 : Sort baskets by number of apples.用例 2 :按苹果数量对篮子进行排序。 Would expect basket B with a total of 6 apples, then basket A with a total of 2 apples.预计篮子 B总共有 6 个苹果,然后篮子 A总共有 2 个苹果。

GET /baskets/_search
{
  "aggs": {
    "aggs_baskets": {
      "terms": {
        "field": "name",
        "order": {"nest > fruit_filter > fruit_sum": "desc"}
      },
      "aggs": {
        "nest":{
          "nested":{
            "path": "fruit"
          },
          "aggs":{
            "fruit_filter":{
              "filter": {
                "term": {"fruit.name": "apple"}
              },
              "aggs":{
                "fruit_sum":{
                  "sum": {"field": "fruit.number"}
                }
              }
            }
          }
        }
      }
    }
  }
}

... yielding ... 屈服

...,

"buckets": [
    {
        "key": "basket B",
        "doc_count": 1,
        "nest": {
            "doc_count": 2,
            "fruit_filter": {
                "doc_count": 2,
                "fruit_sum": {
                    "value": 6
                }
            }
        }
    },
    {
        "key": "basket A",
        "doc_count": 1,
        "nest": {
            "doc_count": 2,
            "fruit_filter": {
                "doc_count": 1,
                "fruit_sum": {
                    "value": 2
                }
            }
        }
    }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM