简体   繁体   English

select 最后一个非空值和 append 到另一列 BigQuery/PYTHON

[英]select last non-null value and append it to another column BigQuery/PYTHON

I have a table in BQ that looks like this:我在 BQ 中有一张看起来像这样的表:

Row     Field  DateTime  
1        one   10:00 AM    
2        null  10:05 AM     
3        null  10:10 AM    
4        one   10:30 AM    
5        null  11:00 AM    
6        two   11:15 AM    
7        two   11:30 AM 
8        null  11:35 AM
9        null  11:40 AM
10       null  11:50 AM
11       null  12:00 AM
12       null  12:15 AM
13       two   12:30 AM
14       null  12:15 AM
15       null  12:25 AM
16       null  12:35 AM
17       three 12:55 AM     

I want to create another column called prevField and fill it out with the last Field value that is not null, when the first and last entry around the null are the same.我想创建另一个名为 prevField 的列,并用不是 null 的最后一个字段值填充它,当 null 周围的第一个和最后一个条目相同时。 When the first and last entry around null are different, it should remain null. The result would look like the following:当 null 左右的第一个和最后一个条目不同时,它应该保持为 null。结果如下所示:

  Row     Field    DateTime  prevField
    1        one   10:00 AM   null 
    2        null  10:05 AM   one    
    3        null  10:10 AM   one   
    4        one   10:30 AM   one
    5        null  11:00 AM   null
    6        two   11:15 AM   two
    7        two   11:30 AM   two
    8        null  11:35 AM   two
    9        null  11:40 AM   two
    10       null  11:50 AM   two
    11       null  12:00 AM   two
    12       null  12:15 AM   two
    13       two   12:30 AM   two 
    14       null  12:15 AM   null
    15       null  12:15 AM   null
    16       null  12:15 AM   null
    17       three 12:15 AM   three  

So far i tried the following code variations for first part of the question (fill out prevField with the last Field value that is not null, when the first and last entry around the null are the same) but without success.到目前为止,我为问题的第一部分尝试了以下代码变体(当 null 周围的第一个和最后一个条目相同时,用最后一个字段值不是 null 填写 prevField)但没有成功。

select Field, Datetime,

(1)--case when FieldName is null then LAG(FieldName) over (order by DateTime) else FieldName end as prevFieldName

(2)--LAST_VALUE(FieldName IGNORE NULLS) OVER (ORDER BY DateTime

(3)--ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS prevFieldName

(4)-- first_value(FieldName)over(order by DateTime) as prevFieldName  

from table

EDIT: I added rows to the data and change row numbers编辑:我向数据添加了行并更改了行号

You can use following logic to achieve your goal.您可以使用以下逻辑来实现您的目标。 Sample Data creation:示例数据创建:

WITH
Base AS
(
SELECT *
FROM(
SELECT 123 Row, 'one'  Field, '10:00 AM' DateTime  
UNION ALL
SELECT 123, null, '10:05 AM'     
UNION ALL
SELECT
123,    null,  '10:10 AM'
UNION ALL
SELECT
123   ,   'one'  , '10:30 AM'
UNION ALL
SELECT
456,null,'11:00 AM'
UNION ALL
SELECT
456,'two','11:15 AM'
UNION ALL
SELECT
789,'two','11:30 AM'))

Logic: The query grabs max and min for each field and also the lead and lag values for each row, based on that it determines the prevfield values.逻辑:查询获取每个字段的最大值和最小值,以及每行的超前值和滞后值,基于它确定 prevfield 值。

SELECT a.Field,DateTime,
CASE WHEN a.DateTime = a.min_date THEN ''
WHEN  a.lag_field IS NOT NULL and a.lead_field IS NULL THEN a.lag_field
WHEN  a.lag_field IS NULL and a.lead_field IS NOT NULL THEN a.lead_field
WHEN a.lag_field != a.lead_field THEN a.lag_field
WHEN a.Field IS NOT NULL AND a.lag_field IS NULL AND a.lead_field IS NULL AND a.DateTime = a.Max_date THEN a.Field
ELSE ''
END as prevField
FROM(
SELECT Base.Field,DateTime,LAG(Base.Field) over (order by DateTime)lag_field,Lead(Base.Field) over (order by DateTime) lead_field,min_date,Max_date
From Base LEFT JOIN (SELECT Field,MIN(DateTime) min_date,MAX(DateTime) Max_date FROM Base Group by Field) b
ON Base.Field = b.Field
) a

This query partly solve my problem:此查询部分解决了我的问题:

 CREATE TEMP FUNCTION ToHex(x INT64) AS (
      (SELECT STRING_AGG(FORMAT('%02x', x >> (byte * 8) & 0xff), '' ORDER BY byte DESC)
       FROM UNNEST(GENERATE_ARRAY(0, 7)) AS byte)
    );
    SELECT
      
         DateTime
         Field
        , SUBSTR(MAX( ToHex(row_n) || Field) OVER (ORDER BY row_n ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 17) AS previous
    FROM (
        SELECT *, ROW_NUMBER() over (ORDER BY DateTime) AS row_n
        FROM `xx.yy.zz`
    );

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 bigQuery 中仅返回非空键/值 - Returning only non-null key/value in bigQuery 获取多个分区中的下一个(或上一个)非空值 - Get the next (or previous) non-null value in multiple partitioned BigQuery:Select 列如果存在,否则输入 NULL? - BigQuery: Select column if it exists, else put NULL? FirebaseCloudMessaging: PlatformException (PlatformException(null-error, Host platform returned null value for non-null return value., null, null)) - FirebaseCloudMessaging : PlatformException (PlatformException(null-error, Host platform returned null value for non-null return value., null, null)) 未处理的异常:PlatformException(空错误,主机平台为非空返回值返回 null 值。,null,空) - Unhandled Exception: PlatformException(null-error, Host platform returned null value for non-null return value., null, null) 必须返回非空值,因为返回类型“UserCredentialPlatform”不允许空值 - A non-null value must be returned since the return type 'UserCredentialPlatform' doesn't allow null 根据跨多个列的第一个可用非空值连接两个表 - Join two tables based on first available non-null value across multiple columns BigQuery LAST_VALUE 有条件 - BigQuery LAST_VALUE With Condition 错误:必须返回非空值,因为返回类型“Never”不允许 null。Never convertPlatformException(对象异常,StackTrace - Error: A non-null value must be returned since the return type 'Never' doesn't allow null. Never convertPlatformException(Object exception, StackTrace BigQuery IF 条件然后将 append 值转换为数组 - 标准 SQL - BigQuery IF condition then append value into Array - Standard SQL
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM