简体   繁体   English

条件的弹性搜索聚合过滤器数组

[英]Elastic Search aggregation filter array for condition

My data looks like the following:我的数据如下所示:

[
    {
        "name": "Scott",
        "origin": "London",
        "travel": [
            {
                "active": false,
                "city": "Berlin",
                "visited": "2020-02-01"
            },
            {
                "active": true,
                "city": "Prague",
                "visited": "2020-02-15"
            }
        ]
    },
    {
        "name": "Lilly",
        "origin": "London",
        "travel": [
            {
                "active": true,
                "city": "Scotland",
                "visited": "2020-02-01"
            }
        ]
    }
]

I want to perform an aggregation where each top-level origin is a bucket, then a nested aggregation to see how many people are currently visiting each city.我想执行一个聚合,其中每个顶级原点都是一个桶,然后是一个嵌套聚合以查看当前有多少人正在访问每个城市。 I therefore only care what the city is if active is true .因此,如果activetrue ,我只关心城市是什么。

Using a filter, it will search the visited array and return the complete array (both objects) if one has active set to true.使用过滤器,它将搜索visited的数组并返回完整的数组(两个对象)(如果有一个将active设置为 true)。 I do not want to include cities where active is false.我不想包括active为假的城市。

Desired output:所需的 output:

{
  "aggregations": {
    "origin": {
      "buckets": [
        {
          "key": "London",
          "buckets": [
            {
              "key": "travel",
              "doc_count": 2555,
              "buckets": [
                {
                  "key": "Scotland",
                  "doc_count": 1
                },
                {
                  "key": "Prague",
                  "doc_count": 1
                }
              ]
            }
          ]
        }
      ]
    }
  }
}

Above I only have 2 counts of under the travel aggregation because only two travel objects have active set to true.上面我只有 2 个 under travel 聚合计数,因为只有两个 travel 对象将 active 设置为 true。

Currently, I have my aggregation set up like so:目前,我的聚合设置如下:

{
  "from": 0,
  "aggs": {
    "origin": {
      "terms": {
        "field": "origin"
      },
      "aggs": {
        "travel": {
          "filter": {
            "term": {
              "travel.active": true
            }
          },
          "aggs": {
            "city": {
              "terms": {
                "field": "city"
              }
            }
          }
        }
      }
    }
  }
}

I have my top level aggregation on origin , then a nested agg on the travel array.我在origin上有我的顶级聚合,然后在travel数组上有一个嵌套的聚合。 Here I have a filter on travel.active = true , then another nested agg to create buckets for each city.这里我有一个过滤器travel.active = true ,然后是另一个嵌套的聚合来为每个城市创建桶。

In my aggregation, it's still producing Berlin as a city, even though I'm filtering on active = true.在我的聚合中,即使我正在过滤 active = true,它仍然将Berlin作为一个城市。

My guess is because it's allowing it as active: true is true for one of the objects in the array.我的猜测是因为它允许它作为active: true对于数组中的一个对象是 true。

How do I filter out active: false completely from the aggregation?如何从聚合中完全过滤掉active: false

You will have to use "nested aggregation."您将不得不使用“嵌套聚合”。 Official documentation link for a reference 官方文档链接供参考

Here is an example for your query:以下是您的查询示例:

Mapping:映射:

PUT /city_index
{
  "mappings": {
    "properties": {
      "name" : { "type" : "keyword" },
      "origin" : { "type" : "keyword" },
      "travel": { 
        "type": "nested",
        "properties": {
          "active": {
            "type": "boolean"
          },
          "city": {
            "type": "keyword"
          },
          "visited" : {
            "type":"date"
          }
        }
      }
    }
  }
}

Insert:插入:

PUT /city_index/_doc/1
{
  "name": "Scott", 
  "origin" : "London",
  "travel": [
    {
      "active": false,
      "city": "Berlin",
      "visited" : "2020-02-01"
    },
    {
      "active": true,
      "city": "Prague",
      "visited": "2020-02-15"
    }
  ]
}

PUT /city_index/_doc/2
{
  "name": "Lilly",
  "origin": "London",
  "travel": [
    {
      "active": true,
      "city": "Scotland",
      "visited": "2020-02-01"
    }
  ]
}

Query:询问:

GET /city_index/_search
{
  "size": 0,
  "aggs": {
    "origin": {
      "terms": {
        "field": "origin"
      },
      "aggs": {
        "city": {
          "nested": {
            "path": "travel"
          },
          "aggs": {
            "travel": {
              "filter": {
                "term": {
                  "travel.active": true
                }
              },
              "aggs": {
                "city": {
                  "terms": {
                    "field": "travel.city"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Output: Output:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "origin": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "London",
          "doc_count": 2,
          "city": {
            "doc_count": 3,
            "travel": {
              "doc_count": 2,
              "city": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "Prague",
                    "doc_count": 1
                  },
                  {
                    "key": "Scotland",
                    "doc_count": 1
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

The suggestion @karthick is good but I add the filter in query. @karthick 的建议很好,但我在查询中添加了过滤器。 That way you will have a smaller amount of values in the aggregation stage.这样您在聚合阶段将拥有更少量的值。

GET idx_travel/_search
{
  "size": 0,
  "query": {
    "nested": {
      "path": "travel",
      "query": {
        "term": {
          "travel.active": {
            "value": true
          }
        }
      }
    }
  },
  "aggs": {
    "origin": {
      "terms": {
        "field": "origin"
      },
      "aggs": {
        "city": {
          "nested": {
            "path": "travel"
          },
          "aggs": {
            "city": {
              "terms": {
                "field": "travel.city"
              }
            }
          }
        }
      }
    }
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM