[英]How to flatten an Array Struct to columns in Google BigQuery
考虑下面的表格示例,其中data
的类型为array<struct<key:string,value:string>>
,具有重复的keys
:“ Date ”、“ Country ”和“ Brand ”:
资源 | 数据密钥 | 数据值 |
---|---|---|
第一个文件 | 日期 | 2022-12-14 |
国家 | 德国 | |
品牌 | 奔驰 | |
日期 | 2022-12-15 | |
国家 | 德国 | |
品牌 | 宝马 | |
第二个文件 | 日期 | 2022-12-13 |
国家 | 瑞典 | |
品牌 | 沃尔沃 | |
日期 | 2022-12-10 | |
国家 | 法国 | |
品牌 | 雷诺 |
“重复”键是指每个 data.key 条目始终包含这些键(日期、国家/地区、品牌)。 在此示例中,每个行条目重复两次,但在实际表中,每个唯一条目可能重复更多次。 我想要的结果是:
资源 | 日期 | 国家 | 品牌 |
---|---|---|---|
第一个文件 | 2022-12-14 | 德国 | 奔驰 |
第一个文件 | 2022-12-15 | 德国 | 宝马 |
第二个文件 | 2022-12-13 | 瑞典 | 沃尔沃 |
第二个文件 | 2022-12-10 | 法国 | 雷诺 |
关于如何达到该结果的任何帮助?
如果有帮助,我已经设法将示例表转换为以下格式,以防您想尝试对此表的解决方案:
资源 | 日期.key | 日期.值 | 国家密钥 | 国家.价值 | 品牌密钥 | 品牌价值 |
---|---|---|---|---|---|---|
第一个文件 | 日期 | 2022-12-14 | 国家 | 德国 | 品牌 | 奔驰 |
第一个文件 | 日期 | 2022-12-15 | 国家 | 德国 | 品牌 | 宝马 |
第二个文件 | 日期 | 2022-12-13 | 国家 | 瑞典 | 品牌 | 沃尔沃 |
第二个文件 | 日期 | 2022-12-10 | 国家 | 法国 | 品牌 | 雷诺 |
谢谢!
WITH
tmp AS (
SELECT
source,
key,
value
FROM
UNNEST(ARRAY<STRUCT<source string, data ARRAY<STRUCT<key string, value string>>>>[
("first_file", [("Date","2022-12-14"), ("Country","Germany"),("Brand","Mercedes")]),
("second_file", [("Date","2022-12-13"), ("Country","Sweden"),("Brand","Volvo")])
]),
UNNEST(data) ) -- unnest data first
SELECT
source, date[SAFE_OFFSET(0)] as date, country[SAFE_OFFSET(0)] as country, brand[SAFE_OFFSET(0)] as brand,
FROM
tmp PIVOT (
ARRAY_AGG(value IGNORE NULLS) FOR key IN ("Date", "Country", "Brand")) -- pivot table
您可以使用以下查询:
with sources AS
(
select
'first_file' as source,
[
struct('Date' as key, '2022-12-14' as value),
struct('Country' as key, 'Germany' as value),
struct('Brand' as key, 'Mercedes' as value)
] as data
UNION ALL
select
'second_file' as source,
[
struct('Date' as key, '2022-12-13' as value),
struct('Country' as key, 'Sweden' as value),
struct('Brand' as key, 'Volvo' as value)
] as data
)
select
source,
(SELECT value FROM UNNEST(data) WHERE key = 'Date') AS Date,
(SELECT value FROM UNNEST(data) WHERE key = 'Country') AS Country,
(SELECT value FROM UNNEST(data) WHERE key = 'Brand') AS Brand,
from sources;
您还可以使用udf
来集中逻辑:
CREATE TEMP FUNCTION getValue(k STRING, arr ANY TYPE) AS
((SELECT value FROM UNNEST(arr) WHERE key = k));
with sources AS
(
select
'first_file' as source,
[
struct('Date' as key, '2022-12-14' as value),
struct('Country' as key, 'Germany' as value),
struct('Brand' as key, 'Mercedes' as value)
] as data
UNION ALL
select
'second_file' as source,
[
struct('Date' as key, '2022-12-13' as value),
struct('Country' as key, 'Sweden' as value),
struct('Brand' as key, 'Volvo' as value)
] as data
)
SELECT
source,
getValue('Date', data) AS Date,
getValue('Country', data) AS Country,
getValue('Brand', data) AS Brand
FROM sources;
结果是:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.