[英]Performing SQL updates based on row number and using previous row for calculations
我有一個 python/pandas 代碼,我用它來執行一些計算,但我遇到了性能問題。 我正在嘗試在 SQL 上寫入所有內容,並使用 BigQuery 更新表。
我面臨的問題是根據行號更新現有表並使用以前的行進行計算。
下面的代碼是我正在使用的,現在我需要在 SQL 中執行此操作。“i”是行號。
if i <= 4:
perfil['B'].iloc[i] = 0
else:
perfil['B'].iloc[i] = perfil['A'] + perfil['B'].iloc[i - 2]
因此,對於前 5 行,我做了一些不使用前幾行的計算。 但在那之后,計算將使用之前的行。
我的表已經以這種方式創建:
| DEPTH_M | A | B |
|-----------|---|---|
|1.2 |2 |0 |
|1.4 |3 |0 |
|1.6 |6 |0 |
|1.8 |2 |0 |
|2.0 |1 |0 |
|2.2 |6 |0 |
|2.4 |7 |0 |
|2.6 |6 |0 |
在一些用戶輸入之后,我需要使用我之前顯示的代碼在表中執行更新,結果如下:
| DEPTH_M | A | B |
|-----------|---|------------------------------------------------------|
|1.2 |2 |0 |
|1.4 |3 |0 |
|1.6 |6 |0 |
|1.8 |2 |0 |
|2.0 |1 |0 (Zero till now, first 5 rows are filled with zeros)|
|2.2 |6 |6 (6 from A + 0 from the past two rows) |
|2.4 |7 |7 (7 from A + 0 from the past two rows) |
|2.6 |6 |12 (6 from A + 6 from the past two rows) |
提前致謝!
考慮下面的例子
#standardSQL
with `project.dataset.table` as (
select 1.2 DEPTH_M, 2 A, 0 B union all
select 1.4, 3, 0 union all
select 1.6, 6, 0 union all
select 1.8, 2, 0 union all
select 2.0, 1, 0 union all
select 2.2, 6, 0 union all
select 2.4, 7, 0 union all
select 2.6, 6, 0 union all
select 3.0, 1, 0 union all
select 3.2, 6, 0 union all
select 3.4, 7, 0
)
select DEPTH_M, A,
ifnull(if(rn <= 5, 0, A) + sum(if(rn <= 5, 0, A)) over win, 0) as B,
format('%i + %i', if(rn <= 5, 0, A), ifnull(sum(if(rn <= 5, 0, A)) over win, 0)) as expalnation
from (
select DEPTH_M, A, B,
row_number() over(order by DEPTH_M) rn
from `project.dataset.table`
)
window win as (partition by mod(rn, 2) order by rn rows between unbounded preceding and 1 preceding)
order by DEPTH_M
與 output
如果您需要更新表格 - 您可以在下面使用
update `project.dataset.table` u
set B = s.B
from (
select DEPTH_M, A, ifnull(if(rn <= 5, 0, A) + sum(if(rn <= 5, 0, A)) over win, 0) as B
from (
select DEPTH_M, A, B,
row_number() over(order by DEPTH_M) rn
from `project.dataset.table`
)
window win as (partition by mod(rn, 2) order by rn rows between unbounded preceding and 1 preceding)
) s
where u.DEPTH_M = s.DEPTH_M
and u.A = s.A;
您的循環隱式定義了遞歸關系。 如果你展開一個單項Bn
你會得到一個像這樣的擴展表達式:
最終,每個Bn
都下降為前面A
項的總和(只要n > 5
。)當A
系列的初始值被迫為零時,看到B1
到B5
的計算結果為零,這個簡單的映射 function 無需迭代地存儲輸出。
請注意,要組合的數組索引(下標)全部偏移兩個的連續倍數。 這意味着它們都屬於相同的模同余 class,也就是說它們被二除時具有相同的余數。 這就是使用mod()
的分區的用武之地。
with r as ( /* attach row numbers starting with 1 */
select *, DEPTH_M, A, B, row_number() over (order by DEPTH_M) as rn
from T
)
select DEPTH_M, A,
sum(case when rn > 5 then A else 0 end)
over (partition by mod(rn, 2) order by rn) as B,
from r
order by DEPTH_M;
此查詢不需要處理空值。 它適用於sum() over ()
的默認 window,其中包括當前行。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.