繁体   English   中英

高效的前向填充 bigquery

[英]efficient forward fill bigquery

我正在尝试在 bigquery 中转发填充表,但在执行查询时资源不足。 表大小为 2GB。 该表如下所示:

with t as (
    select timestamp '2021-05-01 00:00:01' as time, 10 as number union all
    select timestamp '2021-05-01 05:00:01' as time, NULL as number union all
    select timestamp '2021-05-01 23:00:01' as time, 20 as number union all
    select timestamp '2021-05-02 00:00:01' as time, NULL as number union all
    select timestamp '2021-05-02 01:00:01' as time, NULL as number union all 
    select timestamp '2021-05-02 05:00:01' as time, 12 as number
)
时间 数字
2021-05-01 00:00:01 10
2021-05-01 05:00:01 NULL
2021-05-01 23:00:01 20
2021-05-02 00:00:01 NULL
2021-05-02 01:00:01 NULL
2021-05-02 05:00:01 12

所需的 output 是:

时间 数字
2021-05-01 00:00:01 10
2021-05-01 05:00:01 10
2021-05-01 23:00:01 20
2021-05-02 00:00:01 20
2021-05-02 01:00:01 20
2021-05-02 05:00:01 12

我目前的解决方案是:

SELECT time,
LAST_VALUE(number IGNORE NULLS) OVER(ORDER BY time) AS number
FROM t

它抛出:

Resources exceeded during query execution: The query could not be executed in the allotted memory.

问题是 ORDER BY 的 OVER。 我尝试按天使用分区运行查询,并成功执行。

SELECT time,
LAST_VALUE(number IGNORE NULLS) OVER(PARTITION BY DATETIME_TRUNC(time, day) ORDER BY time) AS number
FROM t
时间 数字
2021-05-01 00:00:01 10
2021-05-01 05:00:01 10
2021-05-01 23:00:01 20
2021-05-02 00:00:01 NULL
2021-05-02 01:00:01 NULL
2021-05-02 05:00:01 12

问题是它仍然有 null 值,但比原始表少了大约 500 倍。 不确定是否可以基于此解决问题。 有没有有效的方法来解决这个问题?

试试下面

SELECT time, 
NTH_VALUE(number, 1 IGNORE NULLS) OVER(ORDER BY time DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS number
FROM t

要么

SELECT time, 
  FIRST_VALUE(number IGNORE NULLS) OVER(ORDER BY time DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS number
FROM t    

我没有要测试的真实数据的好例子 - 所以只是猜测

将 datetime 分区从一天更改为一个月,并填满。

以下为我解决了它:

with t as (
        select timestamp '2021-05-01 00:00:01' as time, 10 as number union all
        select timestamp '2021-05-01 05:00:01' as time, NULL as number union all
        select timestamp '2021-05-01 23:00:01' as time, 20 as number union all
        select timestamp '2021-05-02 00:00:01' as time, NULL as number union all
        select timestamp '2021-05-02 01:00:01' as time, NULL as number union all 
        select timestamp '2021-05-02 05:00:01' as time, 12 as number
    )


SELECT time,
LAST_VALUE(number IGNORE NULLS) OVER(PARTITION BY DATETIME_TRUNC(time, month) ORDER BY time) AS number
FROM t

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM