繁体   English   中英

SQL window function 填补每日价值的空白

[英]SQL window function to fill gaps on daily values

我有一个数据集如下:

+---------------------+---------+--------+
| timestamp           | person  | value  |
|---------------------+---------+--------|
| 2022-06-01 00:00:00 | 1       | 0.01   |
| 2022-06-01 00:00:00 | 2       | 0      |
| 2022-06-01 00:00:00 | 3       | 1      |

| 2022-06-02 07:00:00 | 1       | 0.15   |
| 2022-06-02 07:00:00 | 2       | 0.5    |

| 2021-06-03 01:00:00 | 1       | 0.03   |
+---------------------+---------+--------+

我想填补空白,这样如果每个人都不在场,他们每天都会出现。 例如:人 3 在 2022 年 6 月 1 日的值为 1,但在 2022 年 6 月 2 日未出现,因此该记录应在 2022 年 6 月 2 日与前一天的值一起上升。 但是,如果第 3 个人在 2022-06-02 已经有记录,那么我们什么也不做。

+---------------------+---------+--------+
| timestamp           | person  | value  |
|---------------------+---------+--------|
| 2022-06-01 00:00:00 | 1       | 0.01   |
| 2022-06-01 00:00:00 | 2       | 0      |
| 2022-06-01 00:00:00 | 3       | 1      | 

| 2022-06-02 07:00:00 | 1       | 0.15   |
| 2022-06-02 07:00:00 | 2       | 0.5    |
| 2022-06-02 00:00:00 | 3       | 1      |

| 2021-06-03 01:00:00 | 1       | 0.03   |
| 2022-06-03 01:00:00 | 2       | 0.5    |
| 2022-06-03 01:00:00 | 3       | 1      |
+---------------------+---------+--------+

我认为这可以通过系列一代和 window function 来完成,但我似乎无法获得有效的解决方案。 (由于源表很大,需要一个有效的解决方案)

提前感谢您的任何回复!

考虑以下方法

select if(date = date(timestamp), timestamp(timestamp), timestamp(date)) timestamp, person, value
from (
  select *, coalesce(
    first_value(date(timestamp)) over next_date - 1,
    max(date(timestamp)) over last_date,
    date(timestamp)) next_date
  from your_table
  window last_date as (order by unix_date(date(timestamp)) range between 1 following and unbounded following),
    next_date as (partition by person order by unix_date(date(timestamp)) range between 1 following and unbounded following)
), unnest(generate_date_array(date(timestamp), next_date)) date

如果应用于您问题中的示例数据 - output 是

在此处输入图像描述

另一种方法是

SELECT COALESCE(timestamp, timestamp(date)) timestamp,
       p.person,
       LAST_VALUE(s.value IGNORE NULLS) OVER w value
  FROM (SELECT DISTINCT DATE(timestamp) date FROM sample) t,
       (SELECT DISTINCT person FROM sample) p
  LEFT JOIN sample s ON t.date = DATE(s.timestamp) AND p.person = s.person
WINDOW w AS (PARTITION BY p.person ORDER BY COALESCE(timestamp, timestamp(date)));

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM