简体   繁体   中英

efficient forward fill bigquery

I am trying to forward fill a table in bigquery but I am running out of resourses when executing the query. Table size is 2GB. The table looks like this one:

with t as (
    select timestamp '2021-05-01 00:00:01' as time, 10 as number union all
    select timestamp '2021-05-01 05:00:01' as time, NULL as number union all
    select timestamp '2021-05-01 23:00:01' as time, 20 as number union all
    select timestamp '2021-05-02 00:00:01' as time, NULL as number union all
    select timestamp '2021-05-02 01:00:01' as time, NULL as number union all 
    select timestamp '2021-05-02 05:00:01' as time, 12 as number
)
time number
2021-05-01 00:00:01 10
2021-05-01 05:00:01 NULL
2021-05-01 23:00:01 20
2021-05-02 00:00:01 NULL
2021-05-02 01:00:01 NULL
2021-05-02 05:00:01 12

The desired output is:

time number
2021-05-01 00:00:01 10
2021-05-01 05:00:01 10
2021-05-01 23:00:01 20
2021-05-02 00:00:01 20
2021-05-02 01:00:01 20
2021-05-02 05:00:01 12

My solution at the moment is:

SELECT time,
LAST_VALUE(number IGNORE NULLS) OVER(ORDER BY time) AS number
FROM t

It throws:

Resources exceeded during query execution: The query could not be executed in the allotted memory.

The problem is the OVER with ORDER BY. I tried to run the query with a partition by day and it is executed successfully.

SELECT time,
LAST_VALUE(number IGNORE NULLS) OVER(PARTITION BY DATETIME_TRUNC(time, day) ORDER BY time) AS number
FROM t
time number
2021-05-01 00:00:01 10
2021-05-01 05:00:01 10
2021-05-01 23:00:01 20
2021-05-02 00:00:01 NULL
2021-05-02 01:00:01 NULL
2021-05-02 05:00:01 12

The problem is that it still has null values, but about 500 times less than the original table. Not sure if the problem can be solved based on this. Is there any efficient way to solve this?

Try below

SELECT time, 
NTH_VALUE(number, 1 IGNORE NULLS) OVER(ORDER BY time DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS number
FROM t

OR

SELECT time, 
  FIRST_VALUE(number IGNORE NULLS) OVER(ORDER BY time DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS number
FROM t    

I don't have good example of real data to test - so just a guess

Changed the datetime partition from day to month and it filled.

The following resolved it for me:

with t as (
        select timestamp '2021-05-01 00:00:01' as time, 10 as number union all
        select timestamp '2021-05-01 05:00:01' as time, NULL as number union all
        select timestamp '2021-05-01 23:00:01' as time, 20 as number union all
        select timestamp '2021-05-02 00:00:01' as time, NULL as number union all
        select timestamp '2021-05-02 01:00:01' as time, NULL as number union all 
        select timestamp '2021-05-02 05:00:01' as time, 12 as number
    )


SELECT time,
LAST_VALUE(number IGNORE NULLS) OVER(PARTITION BY DATETIME_TRUNC(time, month) ORDER BY time) AS number
FROM t

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM