[英]How to Automatically Add STRUCT Elements in BigQuery
I have a BigQuery table with a STRUCT field and I would like to be able to automatically increase its list of elements whenever I attempt to insert a previously unseen element.我有一个带有 STRUCT 字段的 BigQuery 表,我希望能够在尝试插入以前未见过的元素时自动增加其元素列表。 Is this possible?
这可能吗?
-- initially meta only has elements: hair, eyes
CREATE TEMP TABLE tt AS
SELECT
1 AS id,
STRUCT (
'brown' AS hair,
'brown' AS eyes
) AS meta;
-- now I would like to add a neverbefore seen element: weight
INSERT INTO tt
SELECT
2 AS id,
STRUCT (
'brown' AS hair,
160 AS weight
) AS meta;
This obviously does not work and returns the error Query column 2 has type STRUCT<hair STRING, weight INT64> which cannot be inserted into column meta, which has type STRUCT<hair STRING, eyes STRING> at [10:1]
.这显然不起作用并返回错误
Query column 2 has type STRUCT<hair STRING, weight INT64> which cannot be inserted into column meta, which has type STRUCT<hair STRING, eyes STRING> at [10:1]
。
The resulting temp table looks like the following after the initial construction:初始构造后生成的临时表如下所示:
id ![]() |
meta.hair![]() |
meta.eyes![]() |
---|---|---|
1 ![]() |
brown![]() |
brown![]() |
And then it would ideally automatically add the element "weight" to meta after inserting row 2:然后理想情况下,它会在插入第 2 行后自动将元素“权重”添加到 meta:
id ![]() |
meta.hair![]() |
meta.eyes![]() |
meta.weight![]() |
---|---|---|---|
1 ![]() |
brown![]() |
brown![]() |
NULL ![]() |
2 ![]() |
brown![]() |
NULL ![]() |
160 ![]() |
This is probably wishful thinking.这大概是一厢情愿。
As a real-world example, I know that Stitch 's Webhook --> BigQuery integration is somehow achieving this behavior when it syncs data from some of our SaaS products into BigQuery.作为一个真实的例子,我知道Stitch的Webhook --> BigQuery 集成在将我们的一些 SaaS 产品中的数据同步到 BigQuery 时以某种方式实现了这种行为。 Stitch handles new, never-seen-before nested fields inside JSON payloads by adding new elements to corresponding STRUCT fields.
Stitch 通过将新元素添加到相应的 STRUCT 字段来处理 JSON 有效负载中新的、从未见过的嵌套字段。 I am just not sure how this magic is happening.
我只是不确定这种魔法是如何发生的。
One way of doing this is with key-value fields in an array.一种方法是使用数组中的键值字段。 Instead of naming the fields directly, you add a name field and a field for each data type you need:
您无需直接命名字段,而是为您需要的每种数据类型添加一个名称字段和一个字段:
CREATE TEMP TABLE tt AS
SELECT
1 AS id,
[
STRUCT('hair' as name, 'brown' AS str_value, null as int_value),
STRUCT('eyes' as name, 'brown' AS str_value, null as int_value)
] AS meta;
INSERT INTO tt
SELECT
2 AS id,
[
STRUCT('hair' as name, 'brown' AS str_value, null as int_value),
STRUCT('weight' as name, cast(null as string) AS str_value, 160 as int_value)
] AS meta;
select * from tt
Note, that the default data type is int64 (in case you're not explicit using null )请注意,默认数据类型是 int64 (如果您没有明确使用null )
Assuming your have two sets of data假设您有两组数据
Consider below approach考虑以下方法
create temp function json_extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function json_extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
create temp table temp_table as (
select id, key, value
from (
select id, to_json_string(meta) json from table_1
union all
select id, to_json_string(meta) from table_2
), unnest(json_extract_keys(json)) key with offset
join unnest(json_extract_values(json)) value with offset
using(offset)
);
execute immediate(select '''
select id, struct(''' || string_agg(distinct key, ',') || ''') meta from temp_table
pivot (any_value(value) for key in ("''' || string_agg(distinct key, '","') || '"))'
from temp_table
);
with output与 output
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.