[英]select last non-null value and append it to another column BigQuery/PYTHON
我在 BQ 中有一張看起來像這樣的表:
Row Field DateTime
1 one 10:00 AM
2 null 10:05 AM
3 null 10:10 AM
4 one 10:30 AM
5 null 11:00 AM
6 two 11:15 AM
7 two 11:30 AM
8 null 11:35 AM
9 null 11:40 AM
10 null 11:50 AM
11 null 12:00 AM
12 null 12:15 AM
13 two 12:30 AM
14 null 12:15 AM
15 null 12:25 AM
16 null 12:35 AM
17 three 12:55 AM
我想創建另一個名為 prevField 的列,並用不是 null 的最后一個字段值填充它,當 null 周圍的第一個和最后一個條目相同時。 當 null 左右的第一個和最后一個條目不同時,它應該保持為 null。結果如下所示:
Row Field DateTime prevField
1 one 10:00 AM null
2 null 10:05 AM one
3 null 10:10 AM one
4 one 10:30 AM one
5 null 11:00 AM null
6 two 11:15 AM two
7 two 11:30 AM two
8 null 11:35 AM two
9 null 11:40 AM two
10 null 11:50 AM two
11 null 12:00 AM two
12 null 12:15 AM two
13 two 12:30 AM two
14 null 12:15 AM null
15 null 12:15 AM null
16 null 12:15 AM null
17 three 12:15 AM three
到目前為止,我為問題的第一部分嘗試了以下代碼變體(當 null 周圍的第一個和最后一個條目相同時,用最后一個字段值不是 null 填寫 prevField)但沒有成功。
select Field, Datetime,
(1)--case when FieldName is null then LAG(FieldName) over (order by DateTime) else FieldName end as prevFieldName
(2)--LAST_VALUE(FieldName IGNORE NULLS) OVER (ORDER BY DateTime
(3)--ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS prevFieldName
(4)-- first_value(FieldName)over(order by DateTime) as prevFieldName
from table
編輯:我向數據添加了行並更改了行號
您可以使用以下邏輯來實現您的目標。 示例數據創建:
WITH
Base AS
(
SELECT *
FROM(
SELECT 123 Row, 'one' Field, '10:00 AM' DateTime
UNION ALL
SELECT 123, null, '10:05 AM'
UNION ALL
SELECT
123, null, '10:10 AM'
UNION ALL
SELECT
123 , 'one' , '10:30 AM'
UNION ALL
SELECT
456,null,'11:00 AM'
UNION ALL
SELECT
456,'two','11:15 AM'
UNION ALL
SELECT
789,'two','11:30 AM'))
邏輯:查詢獲取每個字段的最大值和最小值,以及每行的超前值和滯后值,基於它確定 prevfield 值。
SELECT a.Field,DateTime,
CASE WHEN a.DateTime = a.min_date THEN ''
WHEN a.lag_field IS NOT NULL and a.lead_field IS NULL THEN a.lag_field
WHEN a.lag_field IS NULL and a.lead_field IS NOT NULL THEN a.lead_field
WHEN a.lag_field != a.lead_field THEN a.lag_field
WHEN a.Field IS NOT NULL AND a.lag_field IS NULL AND a.lead_field IS NULL AND a.DateTime = a.Max_date THEN a.Field
ELSE ''
END as prevField
FROM(
SELECT Base.Field,DateTime,LAG(Base.Field) over (order by DateTime)lag_field,Lead(Base.Field) over (order by DateTime) lead_field,min_date,Max_date
From Base LEFT JOIN (SELECT Field,MIN(DateTime) min_date,MAX(DateTime) Max_date FROM Base Group by Field) b
ON Base.Field = b.Field
) a
此查詢部分解決了我的問題:
CREATE TEMP FUNCTION ToHex(x INT64) AS (
(SELECT STRING_AGG(FORMAT('%02x', x >> (byte * 8) & 0xff), '' ORDER BY byte DESC)
FROM UNNEST(GENERATE_ARRAY(0, 7)) AS byte)
);
SELECT
DateTime
Field
, SUBSTR(MAX( ToHex(row_n) || Field) OVER (ORDER BY row_n ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 17) AS previous
FROM (
SELECT *, ROW_NUMBER() over (ORDER BY DateTime) AS row_n
FROM `xx.yy.zz`
);
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.