[英]Elastic query aggregate by specified time range a day
Hi I need to write a specific query that will aggregate data by work shifts in selected time range during days.嗨,我需要编写一个特定的查询,该查询将在几天内按选定时间范围内的工作班次汇总数据。 The issue is I do not want to specify all the ranges directly in date_range aggregation, just want to specify the from -> to time range for the specific day of the aggregation.
问题是我不想直接在 date_range 聚合中指定所有范围,只想为聚合的特定日期指定从 -> 到时间范围。 Is there any possibility how to do this the easy way?
有没有可能如何以简单的方式做到这一点?
I have this kind of query:我有这种查询:
{
"_source": false,
"size": 10000,
"query": {
"bool": {
"must": [
{
"terms": {
"streamId": [
"ENRG_0054"
]
}
},
{
"range": {
"timestamp": {
"gte": "2021-02-01T00:00:00Z",
"lte": "2021-02-10T01:00:00Z"
}
}
}
]
}
},
"sort": [
{
"timestamp": {
"order": "asc"
}
},
{
"_score": {
"order": "asc"
}
}
],
"aggs": {
"streamId": {
"terms": {
"field": "streamId",
"size": 10000
},
"aggs": {
"days": {
"date_histogram": {
"field": "timestamp",
"interval": "1d"
},
"aggs": {
"shifts": {
"date_range": {
"field": "timestamp",
"format": "HH:mm",
"ranges": [
{
"key": "MORNING",
"from": "06:00",
"to": "14:00"
},
{
"key": "AFTERNOON",
"from": "14:00",
"to": "22:00"
}
],
"keyed": true
},
"aggs": {
"MAX": {
"max": {
"field": "@floatMessage.value.value"
}
},
"MIN": {
"min": {
"field": "@floatMessage.value.value"
}
},
"DIFF": {
"bucket_script": {
"buckets_path": {
"min": "MIN",
"max": "MAX"
},
"script": {
"source": "return (params.max-params.min)"
}
}
}
}
}
}
}
}
}
}
}
but in the result I am getting null for the values as the time ranges are not specified with date.但结果我得到 null 的值,因为时间范围没有用日期指定。
"aggregations": {
"streamId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "ENRG_0054",
"doc_count": 13343,
"days": {
"buckets": [
{
"key_as_string": "2021-02-01T00:00:00.000Z",
"key": 1612137600000,
"doc_count": 2763,
"shifts": {
"buckets": {
"MORNING": {
"from": 2.16E7,
"from_as_string": "06:00",
"to": 5.04E7,
"to_as_string": "14:00",
"doc_count": 0,
"MIN": {
"value": null
},
"MAX": {
"value": null
}
},
"AFTERNOON": {
"from": 5.04E7,
"from_as_string": "14:00",
"to": 7.92E7,
"to_as_string": "22:00",
"doc_count": 0,
"MIN": {
"value": null
},
"MAX": {
"value": null
}
}
}
}
},
example doc:示例文档:
{
"streamId": "ENRG_0054",
"created": "2021-02-01T00:19:42.905Z",
"extra": {},
"location": null,
"model": "floatMessage",
"id": "6017491eb112b21488f6c843",
"value": {
"unit": "°C",
"value": 18.94,
"messageProcessed": "2021-02-01T00:19:41.595Z"
},
"timestamp": "2021-02-01T00:19:39.161Z",
"tags": []
}
When I generate all the date_ranges for desired timestamp range for the whole query the result is ok, is this the only way how to get the desired result or somebody can suggest how to update the query to meet my requirements?当我为整个查询生成所需时间戳范围的所有 date_ranges 时,结果还可以,这是获得所需结果的唯一方法,还是有人可以建议如何更新查询以满足我的要求? thx
谢谢
The reason you're not seeing any buckets inside of the data_range
aggregation has to do with the datetime
vs date
inference -- similar to the one I discussed here a while ago.您在
data_range
聚合中看不到任何存储桶的原因与datetime
时间与date
推断有关——类似于我前一段时间在这里讨论的那个。
In short, the date_range
aggregation appears confusing when handling time values ( HH:mm
) as opposed to full datetime values ( MM-dd-yyyy HH:mm
) because:简而言之,在处理时间值(
HH:mm
)而不是完整的日期时间值( MM-dd-yyyy HH:mm
)时, date_range
聚合看起来令人困惑,因为:
year
is provided, it'll default to 1970year
,则默认为1970month
is provided, it'll default to Janmonth
,则默认为一月day
is provided, it'll default to the 1st of the month (if no month if provided, it'll default to Jan )day
,则默认为该月的 1 日(如果没有提供月份,则默认为Jan ) You see, if you added just the year component:你看,如果你只添加了年份组件:
"date_range": {
"field": "timestamp",
"format": "HH:mm yyyy", <---
"ranges": [
{
"key": "MORNING",
"from": "06:00 2021", <---
"to": "14:00 2021" <---
}
],
"keyed": true
}
Elasticsearch would return: Elasticsearch 将返回:
"MORNING" : {
"from" : 2.16E7,
"from_as_string" : "06:00 1970", <--- 🥴
"to" : 5.04E7,
"to_as_string" : "14:00 1970", <--- 🥴
...
}
Adding a month
would solve this particular point-in-time problem but woeld of course introduce the problem of just being able to aggregate on one single month of one concrete year.增加
month
将解决这个特定的时间点问题,但当然会引入只能在一个具体年份的一个月上进行聚合的问题。
date
field, called time
, to your mapping:time
的date
字段:{
"mappings": {
"properties": {
"streamId": {
"type": "keyword"
},
...
"time": {
"type": "date", <---
"format": "HH:mm:ss.SSSz"
}
}
}
}
_update_by_query
call ):_update_by_query
调用):{
"streamId": "ENRG_0054",
...
"timestamp": "2021-02-01T00:19:39.161Z",
"time": "00:19:39.161Z", <---
"tags": []
}
time
field insteadtime
字段上进行聚合"days": {
"date_histogram": {
"field": "timestamp", <---
"interval": "1d"
},
"aggs": {
"shifts": {
"date_range": {
"field": "time", <---
"format": "HH:mm",
"ranges": [
That's all there's to it!这就是它的全部!
PS Under the hood, the time
values will be auto-assigned to 1970 but that's fine because you're only interested in the time values. PS 在幕后,
time
值将自动分配给 1970但这很好,因为您只对时间值感兴趣。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.