如何展平由 BigQuery 中重复字段组成的 RECORD 字段？

Question

我的表有一个嵌套的 object （记录）和一些数组（重复）属性。

有 4 个嵌套的属性：

BM.HISTORY.AutoBill.BaseRate
BM.HISTORY.AutoBill.Substatus
BM.HISTORY.AutoBill.Services
BM.HISTORY.AutoBill.Conditions

每个属性都包含这样的数据

[
 {Label: "Completed....", Amount: 34.3, Quantity: 2}, 
 {Label: "Completed....", Amount: 34.3, Quantity: 2}
,...]

我想像这样提取一个简单的平面 csv

OrderId  StartWindow                 Type       Label                      Quantity Amount
123      2022-07-17 13:07:00 UTC     BaseRate   Completed @ 100%           0        82.6
123      2022-07-17 13:07:00 UTC     Services   service 1                  1        16.1323
123      2022-07-17 13:07:00 UTC     Services   service 2                  1        5
123      2022-07-17 13:07:00 UTC     Conditions 10% Time Window Premium    0.826    8.26

234      2022-07-17 13:07:00 UTC     BaseRate   Completed @ 100%           0        3.6
234      2022-07-17 13:07:00 UTC     Services   service 1                  1        16.1323
234      2022-07-17 13:07:00 UTC     Services   service 2                  1        5
234      2022-07-17 13:07:00 UTC     Conditions 10% Time Window Premium    0.826    8.26

StartWindow 是我表的一个简单字段。
类型是我根据上面 for 属性中的BILL.XXXX创建的标签，其他 3 列将是每个数组内的展平记录。
他们需要堆叠成行

这就是我所做的，但我无法在一个简单的 csv 中将其全部展平。

SELECT
  Id as OrderId, StartWindow, Billing
FROM
  `xxx.xxxx.xxDB ADDRESSxxx`,
  UNNEST([ 
      (BM.HISTORY.BILL.BaseRate,'BASE RATE'),
      (BM.HISTORY.BILL.Substatus,'SUBSTATUS'),
      (BM.HISTORY.BILL.Services,'SERVICES'),
      (BM.HISTORY.BILL.Conditions,'CONDITIONS')
    ]
  ) as Billing
LIMIT 10

如您所见，结果并没有完全变平。

Answer 1

你的代码

select x from 
 UNNEST([ 
      ([1,2,3],'BASE RATE'),
      ([5,6],'SUBSTATUS')
 ]) x

首先生成一个包含两个条目的数组，然后将其解压缩成两行。 第一列是一个数组，另一列是字符串名称。

因此，第一列需要再次取消嵌套。 在以下示例中，这是由unnest(x.dataset) as dataset完成的。 A到C列对应于BaseRate和Substatus 。

With tbl as 
(
Select  1 id, "name1" as name, [struct("label1" as label,500 as amount),("t",1)] A,[("B1",20)] B , [("C1",1)] C
union all select 2, "name2",[("A2",2)],[("B2",10)] , null
union all select 3,"-",[],null ,null
union all select 4,"-",[],null ,[("test4",100),("test5",500)]
)

Select id,type,dataset.* from tbl,
unnest([
  struct(A as dataset,'A' as type),struct(B,'B'),struct(C,'C')
]) as x, unnest(x.dataset) as dataset

解决此任务的另一种方法是单独取消嵌套每个记录：

SELECT Id, BaseRate.*, Substatus, Services, Conditions
FROM `xxx.xxxx.xxDB ADDRESSxxx`,
  UNNEST(BM.HISTORY.BILL.BaseRate) as BaseRate,
  UNNEST(BM.HISTORY.BILL.Substatus) as Substatus,
  UNNEST(BM.HISTORY.BILL.Services) as Services,
  UNNEST(BM.HISTORY.BILL.Conditions) as Conditions

但是，这里给出了所有条目的所有组合。

因此，还需要一个连接，并且unnest将WITH OFFSET以获得条目号。

我将通过一个简单的例子来展示路线。 表tbl有一个 id 列，列A和B包含值的记录。 C列包含结构记录。 因此，您的BaseRate列是C列。

tbl表的每一行都必须显示多次，以包括A 、 B 、 C列的所有条目。

我们查询表tbl并计算该行必须显示的时间： A 、 B 、 C列的array_lenght的GREATEST提供了此信息。 unnest和generate_array为每个 id 复制所需的行，并包括一个从零向上计数的entry列。

对于A列，值未嵌套并连接到entry列。 B和C列也是如此。

由于连接到entry列，不同列的条目之间没有组合。`

由于列C_是一个结构，因此C_.*将其解包并直接给出所有条目。

With tbl as 
(
Select  1 id, "name1" as name, [1,2,3,4,5] A,[10,11] B , [struct("label1" as label,500 as amount),("t",1)] C
union all select 2, "name2",[7,8],[20,21,22] , null
union all select 3,"-",[],null ,null
union all select 4,"-",[],null ,[("test",100),("test5",500)]
)

select id,entry,name,A_,B_ ,C_.*
from tbl,
unnest(generate_array(0,GREATEST(1,ifnull(array_length(A),0),ifnull(array_length(B),0),ifnull(array_length(C),0))-1)) as entry

left join 
unnest(A) as A_ WITH OFFSET as A_offset
on A_offset=entry

left join 
unnest(B) as B_ WITH OFFSET as B_offset
on B_offset=entry

left join 
unnest(C) as C_ WITH OFFSET as C_offset
on C_offset=entry

如何展平由 BigQuery 中重复字段组成的 RECORD 字段？

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-08-21 13:28:10

如何展平由 BigQuery 中重复字段组成的 RECORD 字段？

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-08-21 13:28:10

解决方案1
2 已采纳 2022-08-21 13:28:10