简体   繁体   English

如何在特定时间间隔内填充缺失值

[英]How to fill missing values in certain time interval

I have table in below format我有以下格式的表格

user  timestamp              count  total_count

xyz   01-01-2020 00:12:00    45        45
xyz   01-01-2020 00:27:00    12        57
xyz   01-01-2020 00:29:00    11        68
xyz   01-01-2020 00:53:00    32        100

I want the data into 5 min interval like below (Expected Output)我希望数据进入 5 分钟间隔,如下所示(预期输出)

user  timestamp              count  total_count

xyz   01-01-2020 00:05:00    0         0
xyz   01-01-2020 00:10:00    0         0
xyz   01-01-2020 00:15:00    45        45
xyz   01-01-2020 00:20:00    0         45
xyz   01-01-2020 00:25:00    0         45
xyz   01-01-2020 00:30:00    23        68
xyz   01-01-2020 00:35:00    0         68
xyz   01-01-2020 00:40:00    0         68
xyz   01-01-2020 00:45:00    0         68
xyz   01-01-2020 00:50:00    0         68
xyz   01-01-2020 00:55:00    32        100

I tried我试过

   SELECT
        TIMESTAMP_SECONDS(5*60 * DIV(UNIX_SECONDS(timestamp), 5*60)) timekey,
        SUM(count) AS count,
        MAX(total_count) as total_count
   FROM db.table
   WHERE
        timestamp BETWEEN {{ start_date }}
        AND {{ end_date }}
        AND user = {{ user_id }}
   GROUP BY
        timekey
   ORDER BY
        timekey

Result of above query:以上查询结果:

user  timestamp              count  total_count

xyz   01-01-2020 00:15:00    45        45
xyz   01-01-2020 00:30:00    23        68
xyz   01-01-2020 00:55:00    32        100

How can I fill those missing timestamps in above query and fill values of count(with zeros) and total_count(previous non null value)?如何在上述查询中填充那些缺失的时间戳并填充计数值(带零)和 total_count(以前的非 null 值)?

Use generate_timestamp_array() to fill in the missing values:使用generate_timestamp_array()填充缺失值:

SELECT ts,
       SUM(t.count) AS count,
       MAX(t.total_count) as total_count
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY( {{start_date}}, {{end_date}}, INTERVAL 5 minute)) ts LEFT JOIN
     db.table t
     ON t.timestamp >= ts AND
        t.timestamp < TIMESTAMP_ADD(ts, INTERVAL 5 minute) AND
        t.user = {{ user_id }}
GROUP BY ts
ORDER BY ts;

If you need to partition by the table, you can slightly modify the query:如果需要按表分区,可以稍微修改查询:

SELECT ts,
       SUM(t.count) AS count,
       MAX(t.total_count) as total_count
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY( {{start_date}}, {{end_date}}, INTERVAL 5 minute)) ts LEFT JOIN
     (SELECT t.*
      FROM db.table t
      WHERE timestamp BETWEEN {{ start_date }} AND {{ end_date }}
     ) t
     ON t.timestamp >= ts AND
        t.timestamp < TIMESTAMP_ADD(ts, INTERVAL 5 minute) AND
        t.user = {{ user_id }}
GROUP BY ts
ORDER BY ts;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM