![](/img/trans.png)
[英]how to fill missing values in table using sql for window function
[英]SQL window function to fill gaps on daily values
我有一个数据集如下:
+---------------------+---------+--------+
| timestamp | person | value |
|---------------------+---------+--------|
| 2022-06-01 00:00:00 | 1 | 0.01 |
| 2022-06-01 00:00:00 | 2 | 0 |
| 2022-06-01 00:00:00 | 3 | 1 |
| 2022-06-02 07:00:00 | 1 | 0.15 |
| 2022-06-02 07:00:00 | 2 | 0.5 |
| 2021-06-03 01:00:00 | 1 | 0.03 |
+---------------------+---------+--------+
我想填补空白,这样如果每个人都不在场,他们每天都会出现。 例如:人 3 在 2022 年 6 月 1 日的值为 1,但在 2022 年 6 月 2 日未出现,因此该记录应在 2022 年 6 月 2 日与前一天的值一起上升。 但是,如果第 3 个人在 2022-06-02 已经有记录,那么我们什么也不做。
+---------------------+---------+--------+
| timestamp | person | value |
|---------------------+---------+--------|
| 2022-06-01 00:00:00 | 1 | 0.01 |
| 2022-06-01 00:00:00 | 2 | 0 |
| 2022-06-01 00:00:00 | 3 | 1 |
| 2022-06-02 07:00:00 | 1 | 0.15 |
| 2022-06-02 07:00:00 | 2 | 0.5 |
| 2022-06-02 00:00:00 | 3 | 1 |
| 2021-06-03 01:00:00 | 1 | 0.03 |
| 2022-06-03 01:00:00 | 2 | 0.5 |
| 2022-06-03 01:00:00 | 3 | 1 |
+---------------------+---------+--------+
我认为这可以通过系列一代和 window function 来完成,但我似乎无法获得有效的解决方案。 (由于源表很大,需要一个有效的解决方案)
提前感谢您的任何回复!
考虑以下方法
select if(date = date(timestamp), timestamp(timestamp), timestamp(date)) timestamp, person, value
from (
select *, coalesce(
first_value(date(timestamp)) over next_date - 1,
max(date(timestamp)) over last_date,
date(timestamp)) next_date
from your_table
window last_date as (order by unix_date(date(timestamp)) range between 1 following and unbounded following),
next_date as (partition by person order by unix_date(date(timestamp)) range between 1 following and unbounded following)
), unnest(generate_date_array(date(timestamp), next_date)) date
如果应用于您问题中的示例数据 - output 是
另一种方法是
SELECT COALESCE(timestamp, timestamp(date)) timestamp,
p.person,
LAST_VALUE(s.value IGNORE NULLS) OVER w value
FROM (SELECT DISTINCT DATE(timestamp) date FROM sample) t,
(SELECT DISTINCT person FROM sample) p
LEFT JOIN sample s ON t.date = DATE(s.timestamp) AND p.person = s.person
WINDOW w AS (PARTITION BY p.person ORDER BY COALESCE(timestamp, timestamp(date)));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.