It is difficult to explain in words what we are trying to accomplish but easy to explain via example. We have an integer column that only increases within a partition, that also contains many null values:
with
t1 as (
select 1 as rowNum, null as col1 union all
select 2 as rowNum, null as col1 union all
select 3 as rowNum, 1 as col1 union all
select 4 as rowNum, null as col1 union all
select 5 as rowNum, null as col1 union all
select 6 as rowNum, null as col1 union all
select 7 as rowNum, null as col1 union all
select 8 as rowNum, null as col1 union all
select 9 as rowNum, 2 as col1 union all
select 10 as rowNum, 2 as col1 union all
select 11 as rowNum, null as col1 union all
select 12 as rowNum, 2 as col1 union all
select 13 as rowNum, null as col1 union all
select 14 as rowNum, null as col1 union all
select 15 as rowNum, 2 as col1 union all
select 16 as rowNum, null as col1 union all
select 17 as rowNum, null as col1 union all
select 18 as rowNum, null as col1 union all
select 19 as rowNum, null as col1 union all
select 20 as rowNum, null as col1 union all
select 21 as rowNum, null as col1 union all
select 22 as rowNum, 3 as col1 union all
select 23 as rowNum, 3 as col1 union all
select 24 as rowNum, null as col1 union all
select 25 as rowNum, 3 as col1 union all
select 26 as rowNum, 3 as col1 union all
select 27 as rowNum, null as col1 union all
select 28 as rowNum, null as col1 union all
select 29 as rowNum, null as col1 union all
select 30 as rowNum, 4 as col1 union all
select 31 as rowNum, 4 as col1 union all
select 32 as rowNum, null as col1 union all
select 33 as rowNum, null as col1
)
select * from t1
Most of the null values in col1
should be kept, however if there is a null value between two of the same integer , those nulls should be replaced with that integer. In the example above, the null in rows 11, 13 and 14 should be replaced with a 2, and the null in row 24 should be replaced with a 3, as these values fall between two of the same integer. All other null values would remain the same.
This can be solved by windows
function. part1
locks back, part2
locks forward. If the last_value
is the same in both cases, take the value otherwise return null
.
with
t1 as (
select 1 as rowNum, null as col1 union all
select 2 as rowNum, null as col1 union all
select 3 as rowNum, 1 as col1 union all
select 4 as rowNum, null as col1 union all
select 5 as rowNum, null as col1 union all
select 6 as rowNum, null as col1 union all
select 7 as rowNum, null as col1 union all
select 8 as rowNum, null as col1 union all
select 9 as rowNum, 2 as col1 union all
select 10 as rowNum, 2 as col1 union all
select 11 as rowNum, null as col1 union all
select 12 as rowNum, 2 as col1 union all
select 13 as rowNum, null as col1 union all
select 14 as rowNum, null as col1 union all
select 15 as rowNum, 2 as col1 union all
select 16 as rowNum, null as col1 union all
select 17 as rowNum, null as col1 union all
select 18 as rowNum, null as col1 union all
select 19 as rowNum, null as col1 union all
select 20 as rowNum, null as col1 union all
select 21 as rowNum, null as col1 union all
select 22 as rowNum, 3 as col1 union all
select 23 as rowNum, 3 as col1 union all
select 24 as rowNum, null as col1 union all
select 25 as rowNum, 3 as col1 union all
select 26 as rowNum, 3 as col1 union all
select 27 as rowNum, null as col1 union all
select 28 as rowNum, null as col1 union all
select 29 as rowNum, null as col1 union all
select 30 as rowNum, 4 as col1 union all
select 31 as rowNum, 4 as col1 union all
select 32 as rowNum, null as col1 union all
select 33 as rowNum, null as col1
)
select *,
if(last_value(col1 ignore nulls) over part1=last_value(col1 ignore nulls) over part2,last_value(col1 ignore nulls) over part1,null) as col1_new
from t1
window
part1 as ( order by rowNum asc rows between unbounded preceding and current row),
part2 as ( order by rowNum desc rows between unbounded preceding and current row)
order by 1
Consider also below approach
select * except(grp),
if(col1 is null and max(col1) over win2 = max(col1) over win3,
max(col1) over win2, col1
) new_col1
from (
select *, count(*) over win1 - countif(col1 is null ) over win1 as grp
from t1
window win1 as (order by rowNum rows between unbounded preceding and 1 preceding)
)
window win2 as (partition by grp),
win3 as (order by grp range between 1 preceding and 1 preceding)
if applied to sample data in your question - output is
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.