[英]Filling in missing data in Snowflake
I have a table in Snowflake like this:我在雪花中有一张这样的桌子:
TIME USER ITEM
1 frank 1
2 frank 0
3 frank 0
4 frank 0
5 frank 2
6 alf 5
7 alf 0
8 alf 6
9 alf 0
10 alf 9
I want to be able to replace all the zeroes with the next non-zero value, so in the end I have a table like this:我希望能够用下一个非零值替换所有零,所以最后我有一个这样的表:
TIME USER ITEM
1 frank 1
2 frank 2
3 frank 2
4 frank 2
5 frank 2
6 alf 5
7 alf 6
8 alf 6
9 alf 9
10 alf 9
How would I write a query that does that in Snowflake?我将如何编写在 Snowflake 中执行此操作的查询?
You can use conditional_change_event
function for this - documented here :您可以为此使用conditional_change_event
函数 - 此处记录:
with base_table as (
select
t1.*,
conditional_change_event(item) over (order by time desc) event_num
from test_table t1
order by time desc
)
select
t1.time,
t1.user,
t1.item old_item,
coalesce(t2.item, t1.item) new_item
from base_table t1
left join base_table t2 on t1.event_num = t2.event_num + 1 and t1.item = 0
order by t1.time asc
Above SQL Results:以上 SQL 结果:
+----+-----+--------+--------+
|TIME|USER |OLD_ITEM|NEW_ITEM|
+----+-----+--------+--------+
|1 |frank|1 |1 |
|2 |frank|0 |2 |
|3 |frank|0 |2 |
|4 |frank|0 |2 |
|5 |alf |2 |2 |
|6 |alf |5 |5 |
|7 |alf |0 |6 |
|8 |alf |6 |6 |
|9 |alf |0 |9 |
|10 |alf |9 |9 |
+----+-----+--------+--------+
You can use lead(ignore nulls)
:您可以使用lead(ignore nulls)
:
select t.*,
(case when item = 0
then lead(nullif(item, 0) ignore nulls) over (partition by user order by time)
else item
end) as imputed_item
from t;
You can also phrase this using first_value()
:你也可以使用first_value()
来first_value()
:
select t.*,
last_value(nullif(item, 0) ignore nulls) over (partition by user order by time desc)
from t;
If you want to use first_value() or last_value() in Snowflake, please keep in mind that Snowflake supports window frames differently from the ANSI standard as documented here .如果您想在 Snowflake 中使用 first_value() 或 last_value(),请记住,Snowflake 支持的窗口框架与此处记录的 ANSI 标准不同。 This means that if you want to use the default window frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW you have to include it explicitly in the statement, otherwise, the default would be ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING and that is why the LAST_VALUE example from the previous answer would not work correctly.这意味着,如果您想使用默认窗口框架 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,您必须在语句中明确包含它,否则,默认值为 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING,这就是为什么 LAST_VALUE 示例来自以前的答案无法正常工作。 Here is one example that would work:这是一个可行的示例:
select t.*,
last_value(nullif(item, 0) ignore nulls) over (partition by user order by time desc rows between unbounded preceding and current row)
from t;
Nothing wrong with above solutions ... but here's a different approach ... I think it's simpler.上述解决方案没有错......但这里有一种不同的方法......我认为它更简单。
select * from good
union all
select
bad.time
,bad.user
,min(good.item)
from bad
left outer join
good on good.user=bad.user and good.time>bad.time
group by
1,2
Full COPY|PASTE|RUN SQL:完整复制|粘贴|运行 SQL:
with cte as (
select * from (
select 1 time, 'frank' user , 1 item union
select 2 time, 'frank' user , 0 item union
select 3 time, 'frank' user , 0 item union
select 4 time, 'frank' user , 0 item union
select 5 time, 'frank' user , 2 item union
select 6 time, 'alf' user , 5 item union
select 7 time, 'alf' user , 0 item union
select 8 time, 'alf' user , 6 item union
select 9 time, 'alf' user , 0 item union
select 10 time, 'alf' user , 9) )
, good as (select * from cte where item<> 0)
, bad as (select * from cte where item= 0)
select * from good
union all
select
bad.time
,bad.user
,min(good.item )
from bad
left outer join
good on good.user=bad.user and good.time>bad.time
group by
1,2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.