簡體   English   中英

select 最后一個非空值和 append 到另一列 BigQuery/PYTHON

[英]select last non-null value and append it to another column BigQuery/PYTHON

我在 BQ 中有一張看起來像這樣的表:

Row     Field  DateTime  
1        one   10:00 AM    
2        null  10:05 AM     
3        null  10:10 AM    
4        one   10:30 AM    
5        null  11:00 AM    
6        two   11:15 AM    
7        two   11:30 AM 
8        null  11:35 AM
9        null  11:40 AM
10       null  11:50 AM
11       null  12:00 AM
12       null  12:15 AM
13       two   12:30 AM
14       null  12:15 AM
15       null  12:25 AM
16       null  12:35 AM
17       three 12:55 AM     

我想創建另一個名為 prevField 的列,並用不是 null 的最后一個字段值填充它,當 null 周圍的第一個和最后一個條目相同時。 當 null 左右的第一個和最后一個條目不同時,它應該保持為 null。結果如下所示:

  Row     Field    DateTime  prevField
    1        one   10:00 AM   null 
    2        null  10:05 AM   one    
    3        null  10:10 AM   one   
    4        one   10:30 AM   one
    5        null  11:00 AM   null
    6        two   11:15 AM   two
    7        two   11:30 AM   two
    8        null  11:35 AM   two
    9        null  11:40 AM   two
    10       null  11:50 AM   two
    11       null  12:00 AM   two
    12       null  12:15 AM   two
    13       two   12:30 AM   two 
    14       null  12:15 AM   null
    15       null  12:15 AM   null
    16       null  12:15 AM   null
    17       three 12:15 AM   three  

到目前為止,我為問題的第一部分嘗試了以下代碼變體(當 null 周圍的第一個和最后一個條目相同時,用最后一個字段值不是 null 填寫 prevField)但沒有成功。

select Field, Datetime,

(1)--case when FieldName is null then LAG(FieldName) over (order by DateTime) else FieldName end as prevFieldName

(2)--LAST_VALUE(FieldName IGNORE NULLS) OVER (ORDER BY DateTime

(3)--ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS prevFieldName

(4)-- first_value(FieldName)over(order by DateTime) as prevFieldName  

from table

編輯:我向數據添加了行並更改了行號

您可以使用以下邏輯來實現您的目標。 示例數據創建:

WITH
Base AS
(
SELECT *
FROM(
SELECT 123 Row, 'one'  Field, '10:00 AM' DateTime  
UNION ALL
SELECT 123, null, '10:05 AM'     
UNION ALL
SELECT
123,    null,  '10:10 AM'
UNION ALL
SELECT
123   ,   'one'  , '10:30 AM'
UNION ALL
SELECT
456,null,'11:00 AM'
UNION ALL
SELECT
456,'two','11:15 AM'
UNION ALL
SELECT
789,'two','11:30 AM'))

邏輯:查詢獲取每個字段的最大值和最小值,以及每行的超前值和滯后值,基於它確定 prevfield 值。

SELECT a.Field,DateTime,
CASE WHEN a.DateTime = a.min_date THEN ''
WHEN  a.lag_field IS NOT NULL and a.lead_field IS NULL THEN a.lag_field
WHEN  a.lag_field IS NULL and a.lead_field IS NOT NULL THEN a.lead_field
WHEN a.lag_field != a.lead_field THEN a.lag_field
WHEN a.Field IS NOT NULL AND a.lag_field IS NULL AND a.lead_field IS NULL AND a.DateTime = a.Max_date THEN a.Field
ELSE ''
END as prevField
FROM(
SELECT Base.Field,DateTime,LAG(Base.Field) over (order by DateTime)lag_field,Lead(Base.Field) over (order by DateTime) lead_field,min_date,Max_date
From Base LEFT JOIN (SELECT Field,MIN(DateTime) min_date,MAX(DateTime) Max_date FROM Base Group by Field) b
ON Base.Field = b.Field
) a

此查詢部分解決了我的問題:

 CREATE TEMP FUNCTION ToHex(x INT64) AS (
      (SELECT STRING_AGG(FORMAT('%02x', x >> (byte * 8) & 0xff), '' ORDER BY byte DESC)
       FROM UNNEST(GENERATE_ARRAY(0, 7)) AS byte)
    );
    SELECT
      
         DateTime
         Field
        , SUBSTR(MAX( ToHex(row_n) || Field) OVER (ORDER BY row_n ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 17) AS previous
    FROM (
        SELECT *, ROW_NUMBER() over (ORDER BY DateTime) AS row_n
        FROM `xx.yy.zz`
    );

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM