简体   繁体   English

弹性查询每天按指定时间范围聚合

[英]Elastic query aggregate by specified time range a day

Hi I need to write a specific query that will aggregate data by work shifts in selected time range during days.嗨,我需要编写一个特定的查询,该查询将在几天内按选定时间范围内的工作班次汇总数据。 The issue is I do not want to specify all the ranges directly in date_range aggregation, just want to specify the from -> to time range for the specific day of the aggregation.问题是我不想直接在 date_range 聚合中指定所有范围,只想为聚合的特定日期指定从 -> 到时间范围。 Is there any possibility how to do this the easy way?有没有可能如何以简单的方式做到这一点?

I have this kind of query:我有这种查询:

{
    "_source": false,
    "size": 10000,
    "query": {
        "bool": {
            "must": [
                {
                    "terms": {
                        "streamId": [
                            "ENRG_0054"
                        ]
                    }
                },
                {
                    "range": {
                        "timestamp": {
                            "gte": "2021-02-01T00:00:00Z",
                            "lte": "2021-02-10T01:00:00Z"
                        }
                    }
                }
            ]
        }
    },
    "sort": [
        {
            "timestamp": {
                "order": "asc"
            }
        },
        {
            "_score": {
                "order": "asc"
            }
        }
    ],
    "aggs": {
        "streamId": {
            "terms": {
                "field": "streamId",
                "size": 10000
            },
            "aggs": {
                "days": {
                    "date_histogram": {
                        "field": "timestamp",
                        "interval": "1d"
                    },
                    "aggs": {
                        "shifts": {
                            "date_range": {
                                "field": "timestamp",
                                "format": "HH:mm",
                                "ranges": [
                                    {
                                        "key": "MORNING",
                                        "from": "06:00",
                                        "to": "14:00"
                                    },
                                    {
                                        "key": "AFTERNOON",
                                        "from": "14:00",
                                        "to": "22:00"
                                    }
                                ],
                                "keyed": true
                            },
                            "aggs": {
                                "MAX": {
                                    "max": {
                                        "field": "@floatMessage.value.value"
                                    }
                                },
                                "MIN": {
                                    "min": {
                                        "field": "@floatMessage.value.value"
                                    }
                                },
                                "DIFF": {
                                    "bucket_script": {
                                        "buckets_path": {
                                            "min": "MIN",
                                            "max": "MAX"
                                        },
                                        "script": {
                                            "source": "return (params.max-params.min)"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

but in the result I am getting null for the values as the time ranges are not specified with date.但结果我得到 null 的值,因为时间范围没有用日期指定。

  "aggregations": {
        "streamId": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "ENRG_0054",
                    "doc_count": 13343,
                    "days": {
                        "buckets": [
                            {
                                "key_as_string": "2021-02-01T00:00:00.000Z",
                                "key": 1612137600000,
                                "doc_count": 2763,
                                "shifts": {
                                    "buckets": {
                                        "MORNING": {
                                            "from": 2.16E7,
                                            "from_as_string": "06:00",
                                            "to": 5.04E7,
                                            "to_as_string": "14:00",
                                            "doc_count": 0,
                                            "MIN": {
                                                "value": null
                                            },
                                            "MAX": {
                                                "value": null
                                            }
                                        },
                                        "AFTERNOON": {
                                            "from": 5.04E7,
                                            "from_as_string": "14:00",
                                            "to": 7.92E7,
                                            "to_as_string": "22:00",
                                            "doc_count": 0,
                                            "MIN": {
                                                "value": null
                                            },
                                            "MAX": {
                                                "value": null
                                            }
                                        }
                                    }
                                }
                            },

example doc:示例文档:

{
    "streamId": "ENRG_0054",
    "created": "2021-02-01T00:19:42.905Z",
    "extra": {},
    "location": null,
    "model": "floatMessage",
    "id": "6017491eb112b21488f6c843",
    "value": {
      "unit": "°C",
      "value": 18.94,
      "messageProcessed": "2021-02-01T00:19:41.595Z"
    },
    "timestamp": "2021-02-01T00:19:39.161Z",
    "tags": []
  }


When I generate all the date_ranges for desired timestamp range for the whole query the result is ok, is this the only way how to get the desired result or somebody can suggest how to update the query to meet my requirements?当我为整个查询生成所需时间戳范围的所有 date_ranges 时,结果还可以,这是获得所需结果的唯一方法,还是有人可以建议如何更新查询以满足我的要求? thx谢谢

The reason you're not seeing any buckets inside of the data_range aggregation has to do with the datetime vs date inference -- similar to the one I discussed here a while ago.您在data_range聚合中看不到任何存储桶的原因与datetime时间与date推断有关——类似于我前一段时间在这里讨论的那个。

In short, the date_range aggregation appears confusing when handling time values ( HH:mm ) as opposed to full datetime values ( MM-dd-yyyy HH:mm ) because:简而言之,在处理时间值( HH:mm )而不是完整的日期时间值( MM-dd-yyyy HH:mm )时, date_range聚合看起来令人困惑,因为:

  • if no year is provided, it'll default to 1970如果没有提供year ,则默认为1970
  • if no month is provided, it'll default to Jan如果没有提供month ,则默认为一月
  • if no day is provided, it'll default to the 1st of the month (if no month if provided, it'll default to Jan )如果没有提供day ,则默认为该月的 1 日(如果没有提供月份,则默认为Jan
  • and so on.等等。

You see, if you added just the year component:你看,如果你只添加了年份组件:

"date_range": {
  "field": "timestamp",
  "format": "HH:mm yyyy",    <---
  "ranges": [
    {
      "key": "MORNING",
      "from": "06:00 2021",  <---
      "to": "14:00 2021"     <---
    }
  ],
  "keyed": true
}

Elasticsearch would return: Elasticsearch 将返回:

"MORNING" : {
  "from" : 2.16E7,
  "from_as_string" : "06:00 1970",   <--- 🥴
  "to" : 5.04E7,
  "to_as_string" : "14:00 1970",     <--- 🥴
  ...
}

Adding a month would solve this particular point-in-time problem but woeld of course introduce the problem of just being able to aggregate on one single month of one concrete year.增加month将解决这个特定的时间点问题,但当然会引入只能在一个具体年份的一个月上进行聚合的问题。

So I'd propose the following所以我建议以下

  1. Add one more date field, called time , to your mapping:在映射中再添加一个名为timedate字段:
{
  "mappings": {
    "properties": {
      "streamId": {
        "type": "keyword"
      },
      ...
      "time": {
        "type": "date",             <---
        "format": "HH:mm:ss.SSSz"
      }
    }
  }
}
  1. Add this new field to each doc (or use an ingest pipeline , or a scripted _update_by_query call ):将此新字段添加到每个文档(或使用摄取管道,或脚本化的_update_by_query调用):
{
  "streamId": "ENRG_0054",
  ...
  "timestamp": "2021-02-01T00:19:39.161Z",
  "time": "00:19:39.161Z",                 <---
  "tags": []
}
  1. Use the same query as above but aggregate on the time field instead使用与上述相同的查询,但在time字段上进行聚合
"days": {
  "date_histogram": {
    "field": "timestamp",     <---
    "interval": "1d"
  },
  "aggs": {
    "shifts": {
      "date_range": {
        "field": "time",      <---
        "format": "HH:mm",
        "ranges": [

That's all there's to it!这就是它的全部!

PS Under the hood, the time values will be auto-assigned to 1970 but that's fine because you're only interested in the time values. PS 在幕后, time值将自动分配给 1970但这很好,因为您只对时间值感兴趣。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM