简体   繁体   English

在 BigQuery 中,在某些情况下用列中的数字替换 null

[英]In BigQuery, replace null with number in a column under certain circumstances

It is difficult to explain in words what we are trying to accomplish but easy to explain via example.很难用语言来解释我们想要完成的事情,但很容易通过例子来解释。 We have an integer column that only increases within a partition, that also contains many null values:我们有一个仅在分区内增加的 integer 列,它还包含许多 null 值:

with
  t1 as (
    select 1 as rowNum, null as col1 union all
    select 2 as rowNum, null as col1 union all
    select 3 as rowNum, 1 as col1 union all
    select 4 as rowNum, null as col1 union all
    select 5 as rowNum, null as col1 union all
    select 6 as rowNum, null as col1 union all
    select 7 as rowNum, null as col1 union all
    select 8 as rowNum, null as col1 union all
    select 9 as rowNum, 2 as col1 union all
    select 10 as rowNum, 2 as col1 union all
    select 11 as rowNum, null as col1 union all
    select 12 as rowNum, 2 as col1 union all
    select 13 as rowNum, null as col1 union all
    select 14 as rowNum, null as col1 union all
    select 15 as rowNum, 2 as col1 union all
    select 16 as rowNum, null as col1 union all
    select 17 as rowNum, null as col1 union all
    select 18 as rowNum, null as col1 union all
    select 19 as rowNum, null as col1 union all
    select 20 as rowNum, null as col1 union all
    select 21 as rowNum, null as col1 union all
    select 22 as rowNum, 3 as col1 union all
    select 23 as rowNum, 3 as col1 union all
    select 24 as rowNum, null as col1 union all
    select 25 as rowNum, 3 as col1 union all
    select 26 as rowNum, 3 as col1 union all
    select 27 as rowNum, null as col1 union all
    select 28 as rowNum, null as col1 union all
    select 29 as rowNum, null as col1 union all
    select 30 as rowNum, 4 as col1 union all
    select 31 as rowNum, 4 as col1 union all
    select 32 as rowNum, null as col1 union all
    select 33 as rowNum, null as col1
  )

select * from t1

Most of the null values in col1 should be kept, however if there is a null value between two of the same integer , those nulls should be replaced with that integer. In the example above, the null in rows 11, 13 and 14 should be replaced with a 2, and the null in row 24 should be replaced with a 3, as these values fall between two of the same integer. All other null values would remain the same.应保留col1中的大部分null 值,但是如果两个相同的 integer之间存在 null 值,则应将这些空值替换为该 integer。在上面的示例中,第 11、13 和 14 行中的 null 应为替换为 2,第 24 行中的 null 应替换为 3,因为这些值介于两个相同的 integer 之间。所有其他 null 值将保持不变。

This can be solved by windows function. part1 locks back, part2 locks forward.这个可以通过windows function来解决。part1往后锁, part1 part2锁。 If the last_value is the same in both cases, take the value otherwise return null .如果last_value在两种情况下都相同,则取该值,否则返回null

with
  t1 as (
    select 1 as rowNum, null as col1 union all
    select 2 as rowNum, null as col1 union all
    select 3 as rowNum, 1 as col1 union all
    select 4 as rowNum, null as col1 union all
    select 5 as rowNum, null as col1 union all
    select 6 as rowNum, null as col1 union all
    select 7 as rowNum, null as col1 union all
    select 8 as rowNum, null as col1 union all
    select 9 as rowNum, 2 as col1 union all
    select 10 as rowNum, 2 as col1 union all
    select 11 as rowNum, null as col1 union all
    select 12 as rowNum, 2 as col1 union all
    select 13 as rowNum, null as col1 union all
    select 14 as rowNum, null as col1 union all
    select 15 as rowNum, 2 as col1 union all
    select 16 as rowNum, null as col1 union all
    select 17 as rowNum, null as col1 union all
    select 18 as rowNum, null as col1 union all
    select 19 as rowNum, null as col1 union all
    select 20 as rowNum, null as col1 union all
    select 21 as rowNum, null as col1 union all
    select 22 as rowNum, 3 as col1 union all
    select 23 as rowNum, 3 as col1 union all
    select 24 as rowNum, null as col1 union all
    select 25 as rowNum, 3 as col1 union all
    select 26 as rowNum, 3 as col1 union all
    select 27 as rowNum, null as col1 union all
    select 28 as rowNum, null as col1 union all
    select 29 as rowNum, null as col1 union all
    select 30 as rowNum, 4 as col1 union all
    select 31 as rowNum, 4 as col1 union all
    select 32 as rowNum, null as col1 union all
    select 33 as rowNum, null as col1
  )

select *,
if(last_value(col1 ignore nulls) over part1=last_value(col1 ignore nulls) over part2,last_value(col1 ignore nulls) over part1,null) as col1_new
 from t1
 window 
 part1 as ( order by rowNum asc rows between unbounded preceding and current row),
 part2 as ( order by rowNum desc rows between unbounded preceding and current row)
 order by 1

Consider also below approach还请考虑以下方法

select * except(grp), 
  if(col1 is null and max(col1) over win2 = max(col1) over win3,
    max(col1) over win2, col1
  ) new_col1
from (
  select *, count(*) over win1 - countif(col1 is null ) over win1 as grp
  from t1
  window win1 as (order by rowNum rows between unbounded preceding and 1 preceding)
)
window win2 as (partition by grp), 
win3 as (order by grp range between 1 preceding and 1 preceding)          

if applied to sample data in your question - output is如果应用于您问题中的示例数据 - output 是

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM