select 最后一个非空值和 append 到另一列 BigQuery/PYTHON

Question

I have a table in BQ that looks like this:我在 BQ 中有一张看起来像这样的表：

Row     Field  DateTime  
1        one   10:00 AM    
2        null  10:05 AM     
3        null  10:10 AM    
4        one   10:30 AM    
5        null  11:00 AM    
6        two   11:15 AM    
7        two   11:30 AM 
8        null  11:35 AM
9        null  11:40 AM
10       null  11:50 AM
11       null  12:00 AM
12       null  12:15 AM
13       two   12:30 AM
14       null  12:15 AM
15       null  12:25 AM
16       null  12:35 AM
17       three 12:55 AM

I want to create another column called prevField and fill it out with the last Field value that is not null, when the first and last entry around the null are the same.我想创建另一个名为 prevField 的列，并用不是 null 的最后一个字段值填充它，当 null 周围的第一个和最后一个条目相同时。 When the first and last entry around null are different, it should remain null. The result would look like the following:当 null 左右的第一个和最后一个条目不同时，它应该保持为 null。结果如下所示：

  Row     Field    DateTime  prevField
    1        one   10:00 AM   null 
    2        null  10:05 AM   one    
    3        null  10:10 AM   one   
    4        one   10:30 AM   one
    5        null  11:00 AM   null
    6        two   11:15 AM   two
    7        two   11:30 AM   two
    8        null  11:35 AM   two
    9        null  11:40 AM   two
    10       null  11:50 AM   two
    11       null  12:00 AM   two
    12       null  12:15 AM   two
    13       two   12:30 AM   two 
    14       null  12:15 AM   null
    15       null  12:15 AM   null
    16       null  12:15 AM   null
    17       three 12:15 AM   three

So far i tried the following code variations for first part of the question (fill out prevField with the last Field value that is not null, when the first and last entry around the null are the same) but without success.到目前为止，我为问题的第一部分尝试了以下代码变体（当 null 周围的第一个和最后一个条目相同时，用最后一个字段值不是 null 填写 prevField）但没有成功。

select Field, Datetime,

(1)--case when FieldName is null then LAG(FieldName) over (order by DateTime) else FieldName end as prevFieldName

(2)--LAST_VALUE(FieldName IGNORE NULLS) OVER (ORDER BY DateTime

(3)--ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS prevFieldName

(4)-- first_value(FieldName)over(order by DateTime) as prevFieldName  

from table

EDIT: I added rows to the data and change row numbers编辑：我向数据添加了行并更改了行号

Answer 1

You can use following logic to achieve your goal.您可以使用以下逻辑来实现您的目标。 Sample Data creation:示例数据创建：

WITH
Base AS
(
SELECT *
FROM(
SELECT 123 Row, 'one'  Field, '10:00 AM' DateTime  
UNION ALL
SELECT 123, null, '10:05 AM'     
UNION ALL
SELECT
123,    null,  '10:10 AM'
UNION ALL
SELECT
123   ,   'one'  , '10:30 AM'
UNION ALL
SELECT
456,null,'11:00 AM'
UNION ALL
SELECT
456,'two','11:15 AM'
UNION ALL
SELECT
789,'two','11:30 AM'))

Logic: The query grabs max and min for each field and also the lead and lag values for each row, based on that it determines the prevfield values.逻辑：查询获取每个字段的最大值和最小值，以及每行的超前值和滞后值，基于它确定 prevfield 值。

SELECT a.Field,DateTime,
CASE WHEN a.DateTime = a.min_date THEN ''
WHEN  a.lag_field IS NOT NULL and a.lead_field IS NULL THEN a.lag_field
WHEN  a.lag_field IS NULL and a.lead_field IS NOT NULL THEN a.lead_field
WHEN a.lag_field != a.lead_field THEN a.lag_field
WHEN a.Field IS NOT NULL AND a.lag_field IS NULL AND a.lead_field IS NULL AND a.DateTime = a.Max_date THEN a.Field
ELSE ''
END as prevField
FROM(
SELECT Base.Field,DateTime,LAG(Base.Field) over (order by DateTime)lag_field,Lead(Base.Field) over (order by DateTime) lead_field,min_date,Max_date
From Base LEFT JOIN (SELECT Field,MIN(DateTime) min_date,MAX(DateTime) Max_date FROM Base Group by Field) b
ON Base.Field = b.Field
) a

Answer 2

This query partly solve my problem:此查询部分解决了我的问题：

 CREATE TEMP FUNCTION ToHex(x INT64) AS (
      (SELECT STRING_AGG(FORMAT('%02x', x >> (byte * 8) & 0xff), '' ORDER BY byte DESC)
       FROM UNNEST(GENERATE_ARRAY(0, 7)) AS byte)
    );
    SELECT
      
         DateTime
         Field
        , SUBSTR(MAX( ToHex(row_n) || Field) OVER (ORDER BY row_n ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 17) AS previous
    FROM (
        SELECT *, ROW_NUMBER() over (ORDER BY DateTime) AS row_n
        FROM `xx.yy.zz`
    );

select 最后一个非空值和 append 到另一列 BigQuery/PYTHON

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-08-05 05:13:26

解决方案2
0 2020-08-05 11:58:44

select 最后一个非空值和 append 到另一列 BigQuery/PYTHON

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-08-05 05:13:26

解决方案2 0 2020-08-05 11:58:44

解决方案1
1 已采纳 2020-08-05 05:13:26

解决方案2
0 2020-08-05 11:58:44