[英]Elastic Search aggregation filter array for condition
My data looks like the following:我的数据如下所示:
[
{
"name": "Scott",
"origin": "London",
"travel": [
{
"active": false,
"city": "Berlin",
"visited": "2020-02-01"
},
{
"active": true,
"city": "Prague",
"visited": "2020-02-15"
}
]
},
{
"name": "Lilly",
"origin": "London",
"travel": [
{
"active": true,
"city": "Scotland",
"visited": "2020-02-01"
}
]
}
]
I want to perform an aggregation where each top-level origin is a bucket, then a nested aggregation to see how many people are currently visiting each city.我想执行一个聚合,其中每个顶级原点都是一个桶,然后是一个嵌套聚合以查看当前有多少人正在访问每个城市。 I therefore only care what the city is if active
is true
.因此,如果active
为true
,我只关心城市是什么。
Using a filter, it will search the visited
array and return the complete array (both objects) if one has active
set to true.使用过滤器,它将搜索visited
的数组并返回完整的数组(两个对象)(如果有一个将active
设置为 true)。 I do not want to include cities where active
is false.我不想包括active
为假的城市。
Desired output:所需的 output:
{
"aggregations": {
"origin": {
"buckets": [
{
"key": "London",
"buckets": [
{
"key": "travel",
"doc_count": 2555,
"buckets": [
{
"key": "Scotland",
"doc_count": 1
},
{
"key": "Prague",
"doc_count": 1
}
]
}
]
}
]
}
}
}
Above I only have 2 counts of under the travel aggregation because only two travel objects have active set to true.上面我只有 2 个 under travel 聚合计数,因为只有两个 travel 对象将 active 设置为 true。
Currently, I have my aggregation set up like so:目前,我的聚合设置如下:
{
"from": 0,
"aggs": {
"origin": {
"terms": {
"field": "origin"
},
"aggs": {
"travel": {
"filter": {
"term": {
"travel.active": true
}
},
"aggs": {
"city": {
"terms": {
"field": "city"
}
}
}
}
}
}
}
}
I have my top level aggregation on origin
, then a nested agg on the travel
array.我在origin
上有我的顶级聚合,然后在travel
数组上有一个嵌套的聚合。 Here I have a filter on travel.active = true
, then another nested agg to create buckets for each city.这里我有一个过滤器travel.active = true
,然后是另一个嵌套的聚合来为每个城市创建桶。
In my aggregation, it's still producing Berlin
as a city, even though I'm filtering on active = true.在我的聚合中,即使我正在过滤 active = true,它仍然将Berlin
作为一个城市。
My guess is because it's allowing it as active: true
is true for one of the objects in the array.我的猜测是因为它允许它作为active: true
对于数组中的一个对象是 true。
How do I filter out active: false
completely from the aggregation?如何从聚合中完全过滤掉active: false
?
You will have to use "nested aggregation."您将不得不使用“嵌套聚合”。 Official documentation link for a reference 官方文档链接供参考
Here is an example for your query:以下是您的查询示例:
Mapping:映射:
PUT /city_index
{
"mappings": {
"properties": {
"name" : { "type" : "keyword" },
"origin" : { "type" : "keyword" },
"travel": {
"type": "nested",
"properties": {
"active": {
"type": "boolean"
},
"city": {
"type": "keyword"
},
"visited" : {
"type":"date"
}
}
}
}
}
}
Insert:插入:
PUT /city_index/_doc/1
{
"name": "Scott",
"origin" : "London",
"travel": [
{
"active": false,
"city": "Berlin",
"visited" : "2020-02-01"
},
{
"active": true,
"city": "Prague",
"visited": "2020-02-15"
}
]
}
PUT /city_index/_doc/2
{
"name": "Lilly",
"origin": "London",
"travel": [
{
"active": true,
"city": "Scotland",
"visited": "2020-02-01"
}
]
}
Query:询问:
GET /city_index/_search
{
"size": 0,
"aggs": {
"origin": {
"terms": {
"field": "origin"
},
"aggs": {
"city": {
"nested": {
"path": "travel"
},
"aggs": {
"travel": {
"filter": {
"term": {
"travel.active": true
}
},
"aggs": {
"city": {
"terms": {
"field": "travel.city"
}
}
}
}
}
}
}
}
}
}
Output: Output:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"origin": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "London",
"doc_count": 2,
"city": {
"doc_count": 3,
"travel": {
"doc_count": 2,
"city": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Prague",
"doc_count": 1
},
{
"key": "Scotland",
"doc_count": 1
}
]
}
}
}
}
]
}
}
}
The suggestion @karthick is good but I add the filter in query. @karthick 的建议很好,但我在查询中添加了过滤器。 That way you will have a smaller amount of values in the aggregation stage.这样您在聚合阶段将拥有更少量的值。
GET idx_travel/_search
{
"size": 0,
"query": {
"nested": {
"path": "travel",
"query": {
"term": {
"travel.active": {
"value": true
}
}
}
}
},
"aggs": {
"origin": {
"terms": {
"field": "origin"
},
"aggs": {
"city": {
"nested": {
"path": "travel"
},
"aggs": {
"city": {
"terms": {
"field": "travel.city"
}
}
}
}
}
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.