[英]How to fill missing values in certain time interval
I have table in below format我有以下格式的表格
user timestamp count total_count
xyz 01-01-2020 00:12:00 45 45
xyz 01-01-2020 00:27:00 12 57
xyz 01-01-2020 00:29:00 11 68
xyz 01-01-2020 00:53:00 32 100
I want the data into 5 min interval like below (Expected Output)我希望数据进入 5 分钟间隔,如下所示(预期输出)
user timestamp count total_count
xyz 01-01-2020 00:05:00 0 0
xyz 01-01-2020 00:10:00 0 0
xyz 01-01-2020 00:15:00 45 45
xyz 01-01-2020 00:20:00 0 45
xyz 01-01-2020 00:25:00 0 45
xyz 01-01-2020 00:30:00 23 68
xyz 01-01-2020 00:35:00 0 68
xyz 01-01-2020 00:40:00 0 68
xyz 01-01-2020 00:45:00 0 68
xyz 01-01-2020 00:50:00 0 68
xyz 01-01-2020 00:55:00 32 100
I tried我试过
SELECT
TIMESTAMP_SECONDS(5*60 * DIV(UNIX_SECONDS(timestamp), 5*60)) timekey,
SUM(count) AS count,
MAX(total_count) as total_count
FROM db.table
WHERE
timestamp BETWEEN {{ start_date }}
AND {{ end_date }}
AND user = {{ user_id }}
GROUP BY
timekey
ORDER BY
timekey
Result of above query:以上查询结果:
user timestamp count total_count
xyz 01-01-2020 00:15:00 45 45
xyz 01-01-2020 00:30:00 23 68
xyz 01-01-2020 00:55:00 32 100
How can I fill those missing timestamps in above query and fill values of count(with zeros) and total_count(previous non null value)?如何在上述查询中填充那些缺失的时间戳并填充计数值(带零)和 total_count(以前的非 null 值)?
Use generate_timestamp_array()
to fill in the missing values:使用generate_timestamp_array()
填充缺失值:
SELECT ts,
SUM(t.count) AS count,
MAX(t.total_count) as total_count
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY( {{start_date}}, {{end_date}}, INTERVAL 5 minute)) ts LEFT JOIN
db.table t
ON t.timestamp >= ts AND
t.timestamp < TIMESTAMP_ADD(ts, INTERVAL 5 minute) AND
t.user = {{ user_id }}
GROUP BY ts
ORDER BY ts;
If you need to partition by the table, you can slightly modify the query:如果需要按表分区,可以稍微修改查询:
SELECT ts,
SUM(t.count) AS count,
MAX(t.total_count) as total_count
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY( {{start_date}}, {{end_date}}, INTERVAL 5 minute)) ts LEFT JOIN
(SELECT t.*
FROM db.table t
WHERE timestamp BETWEEN {{ start_date }} AND {{ end_date }}
) t
ON t.timestamp >= ts AND
t.timestamp < TIMESTAMP_ADD(ts, INTERVAL 5 minute) AND
t.user = {{ user_id }}
GROUP BY ts
ORDER BY ts;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.