简体   繁体   English

填充雪花中缺失的数据

[英]Filling in missing data in Snowflake

I have a table in Snowflake like this:我在雪花中有一张这样的桌子:

TIME   USER   ITEM
1      frank  1
2      frank  0
3      frank  0
4      frank  0
5      frank  2
6      alf    5
7      alf    0
8      alf    6
9      alf    0
10     alf    9

I want to be able to replace all the zeroes with the next non-zero value, so in the end I have a table like this:我希望能够用下一个非零值替换所有零,所以最后我有一个这样的表:

TIME   USER   ITEM
1      frank  1
2      frank  2
3      frank  2
4      frank  2
5      frank  2
6      alf    5
7      alf    6
8      alf    6
9      alf    9
10     alf    9

How would I write a query that does that in Snowflake?我将如何编写在 Snowflake 中执行此操作的查询?

You can use conditional_change_event function for this - documented here :您可以为此使用conditional_change_event函数 - 此处记录

with base_table as (
    select
        t1.*,
        conditional_change_event(item) over (order by time desc) event_num
    from test_table t1
    order by time desc
)
select
    t1.time,
    t1.user,
    t1.item                    old_item,
    coalesce(t2.item, t1.item) new_item
from base_table t1
   left join base_table t2 on t1.event_num = t2.event_num + 1 and t1.item = 0
order by t1.time asc

Above SQL Results:以上 SQL 结果:

+----+-----+--------+--------+
|TIME|USER |OLD_ITEM|NEW_ITEM|
+----+-----+--------+--------+
|1   |frank|1       |1       |
|2   |frank|0       |2       |
|3   |frank|0       |2       |
|4   |frank|0       |2       |
|5   |alf  |2       |2       |
|6   |alf  |5       |5       |
|7   |alf  |0       |6       |
|8   |alf  |6       |6       |
|9   |alf  |0       |9       |
|10  |alf  |9       |9       |
+----+-----+--------+--------+

You can use lead(ignore nulls) :您可以使用lead(ignore nulls)

select t.*,
       (case when item = 0
             then lead(nullif(item, 0) ignore nulls) over (partition by user order by time)
             else item
        end) as imputed_item
from t;

You can also phrase this using first_value() :你也可以使用first_value()first_value()

select t.*,
       last_value(nullif(item, 0) ignore nulls) over (partition by user order by time desc)
from t;

If you want to use first_value() or last_value() in Snowflake, please keep in mind that Snowflake supports window frames differently from the ANSI standard as documented here .如果您想在 Snowflake 中使用 first_value() 或 last_value(),请记住,Snowflake 支持的窗口框架与此处记录的 ANSI 标准不同。 This means that if you want to use the default window frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW you have to include it explicitly in the statement, otherwise, the default would be ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING and that is why the LAST_VALUE example from the previous answer would not work correctly.这意味着,如果您想使用默认窗口框架 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,您必须在语句中明确包含它,否则,默认值为 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING,这就是为什么 LAST_VALUE 示例来自以前的答案无法正常工作。 Here is one example that would work:这是一个可行的示例:

select t.*,
       last_value(nullif(item, 0) ignore nulls) over (partition by user order by time desc rows between unbounded preceding and current row)
from t;

Nothing wrong with above solutions ... but here's a different approach ... I think it's simpler.上述解决方案没有错......但这里有一种不同的方法......我认为它更简单。

select * from good
union all
select 
     bad.time
    ,bad.user
    ,min(good.item) 
from  bad 
left outer join  
good on good.user=bad.user and good.time>bad.time 
group by
    1,2

在此处输入图片说明

Full COPY|PASTE|RUN SQL:完整复制|粘贴|运行 SQL:

with cte as (
select * from (
select 1  time, 'frank' user , 1 item union
select 2  time, 'frank' user , 0 item union
select 3  time, 'frank' user , 0 item union
select 4  time, 'frank' user , 0 item union
select 5  time, 'frank' user , 2 item union
select 6  time, 'alf' user ,   5 item union
select 7  time, 'alf' user ,   0 item union
select 8  time, 'alf' user ,   6 item union
select 9  time, 'alf' user ,   0 item union
select 10 time, 'alf' user ,   9) )
, good as (select * from cte where item<> 0) 
, bad as (select * from cte where item= 0) 


select *  from  good
union all
select 
     bad.time
    ,bad.user
    ,min(good.item ) 
from  bad 
left outer join  
    good on good.user=bad.user and good.time>bad.time 
group by
    1,2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM