如何將數組結構展平為 Google BigQuery 中的列

Question

考慮下面的表格示例，其中data的類型為array<struct<key:string,value:string>> ，具有重復的keys ：“ Date ”、“ Country ”和“ Brand ”：

資源	數據密鑰	數據值
第一個文件	日期	2022-12-14
	國家	德國
	品牌	奔馳
	日期	2022-12-15
	國家	德國
	品牌	寶馬
第二個文件	日期	2022-12-13
	國家	瑞典
	品牌	沃爾沃
	日期	2022-12-10
	國家	法國
	品牌	雷諾

“重復”鍵是指每個 data.key 條目始終包含這些鍵（日期、國家/地區、品牌）。 在此示例中，每個行條目重復兩次，但在實際表中，每個唯一條目可能重復更多次。 我想要的結果是：

資源	日期	國家	品牌
第一個文件	2022-12-14	德國	奔馳
第一個文件	2022-12-15	德國	寶馬
第二個文件	2022-12-13	瑞典	沃爾沃
第二個文件	2022-12-10	法國	雷諾

關於如何達到該結果的任何幫助？

如果有幫助，我已經設法將示例表轉換為以下格式，以防您想嘗試對此表的解決方案：

資源	日期.key	日期.值	國家密鑰	國家.價值	品牌密鑰	品牌價值
第一個文件	日期	2022-12-14	國家	德國	品牌	奔馳
第一個文件	日期	2022-12-15	國家	德國	品牌	寶馬
第二個文件	日期	2022-12-13	國家	瑞典	品牌	沃爾沃
第二個文件	日期	2022-12-10	國家	法國	品牌	雷諾

謝謝！

Answer 1

WITH
  tmp AS (
  SELECT
    source,
    key,
    value
  FROM
    UNNEST(ARRAY<STRUCT<source string, data ARRAY<STRUCT<key string, value string>>>>[ 
      ("first_file", [("Date","2022-12-14"), ("Country","Germany"),("Brand","Mercedes")]),
      ("second_file", [("Date","2022-12-13"), ("Country","Sweden"),("Brand","Volvo")])
      ]),
    UNNEST(data) ) -- unnest data first
SELECT
  source, date[SAFE_OFFSET(0)] as date, country[SAFE_OFFSET(0)] as country, brand[SAFE_OFFSET(0)] as brand, 
FROM
  tmp PIVOT (
    ARRAY_AGG(value IGNORE NULLS) FOR key IN ("Date", "Country", "Brand")) -- pivot table

Answer 2

您可以使用以下查詢：

with sources AS 
(
  select 
    'first_file' as source,
    [
      struct('Date' as key, '2022-12-14' as value),
      struct('Country' as key, 'Germany' as value),
      struct('Brand' as key, 'Mercedes' as value)
    ] as data
  UNION ALL
  select 
    'second_file' as source,
    [
      struct('Date' as key, '2022-12-13' as value),
      struct('Country' as key, 'Sweden' as value),
      struct('Brand' as key, 'Volvo' as value)
    ] as data
)

select
  source,
  (SELECT value FROM UNNEST(data) WHERE key = 'Date') AS Date,
  (SELECT value FROM UNNEST(data) WHERE key = 'Country') AS Country,
  (SELECT value FROM UNNEST(data) WHERE key = 'Brand') AS Brand,
from sources;

您還可以使用udf來集中邏輯：

CREATE TEMP FUNCTION getValue(k STRING, arr ANY TYPE) AS
((SELECT value FROM UNNEST(arr) WHERE key = k));

with sources AS 
(
  select 
    'first_file' as source,
    [
      struct('Date' as key, '2022-12-14' as value),
      struct('Country' as key, 'Germany' as value),
      struct('Brand' as key, 'Mercedes' as value)
    ] as data
  UNION ALL
  select 
    'second_file' as source,
    [
      struct('Date' as key, '2022-12-13' as value),
      struct('Country' as key, 'Sweden' as value),
      struct('Brand' as key, 'Volvo' as value)
    ] as data
)
SELECT 
  source,
  getValue('Date', data) AS Date,
  getValue('Country', data) AS Country,
  getValue('Brand', data) AS Brand
FROM sources;

結果是：

如何將數組結構展平為 Google BigQuery 中的列

問題描述

2 個解決方案

解決方案1
0 2022-12-14 21:34:05

解決方案2
0 2022-12-14 21:48:19

如何將數組結構展平為 Google BigQuery 中的列

問題描述

2 個解決方案

解決方案1 0 2022-12-14 21:34:05

解決方案2 0 2022-12-14 21:48:19

解決方案1
0 2022-12-14 21:34:05

解決方案2
0 2022-12-14 21:48:19