如何在 BigQuery 中自动添加 STRUCT 元素

Question

I have a BigQuery table with a STRUCT field and I would like to be able to automatically increase its list of elements whenever I attempt to insert a previously unseen element.我有一个带有 STRUCT 字段的 BigQuery 表，我希望能够在尝试插入以前未见过的元素时自动增加其元素列表。 Is this possible?这可能吗？

-- initially meta only has elements: hair, eyes
CREATE TEMP TABLE tt AS
SELECT
  1 AS id,
  STRUCT (
    'brown' AS hair,
    'brown' AS eyes
  ) AS meta;

-- now I would like to add a neverbefore seen element: weight
INSERT INTO tt
SELECT
  2 AS id,
  STRUCT (
    'brown' AS hair,
    160 AS weight
  ) AS meta;

This obviously does not work and returns the error Query column 2 has type STRUCT<hair STRING, weight INT64> which cannot be inserted into column meta, which has type STRUCT<hair STRING, eyes STRING> at [10:1] .这显然不起作用并返回错误Query column 2 has type STRUCT<hair STRING, weight INT64> which cannot be inserted into column meta, which has type STRUCT<hair STRING, eyes STRING> at [10:1] 。

The resulting temp table looks like the following after the initial construction:初始构造后生成的临时表如下所示：

id ID	meta.hair元头发	meta.eyes元眼
1 1	brown棕色的	brown棕色的

And then it would ideally automatically add the element "weight" to meta after inserting row 2:然后理想情况下，它会在插入第 2 行后自动将元素“权重”添加到 meta：

id ID	meta.hair元头发	meta.eyes元眼	meta.weight元权重
1 1	brown棕色的	brown棕色的	NULL NULL
2 2	brown棕色的	NULL NULL	160 160

This is probably wishful thinking.这大概是一厢情愿。

As a real-world example, I know that Stitch 's Webhook --> BigQuery integration is somehow achieving this behavior when it syncs data from some of our SaaS products into BigQuery.作为一个真实的例子，我知道Stitch的Webhook --> BigQuery 集成在将我们的一些 SaaS 产品中的数据同步到 BigQuery 时以某种方式实现了这种行为。 Stitch handles new, never-seen-before nested fields inside JSON payloads by adding new elements to corresponding STRUCT fields. Stitch 通过将新元素添加到相应的 STRUCT 字段来处理 JSON 有效负载中新的、从未见过的嵌套字段。 I am just not sure how this magic is happening.我只是不确定这种魔法是如何发生的。

Answer 1

One way of doing this is with key-value fields in an array.一种方法是使用数组中的键值字段。 Instead of naming the fields directly, you add a name field and a field for each data type you need:您无需直接命名字段，而是为您需要的每种数据类型添加一个名称字段和一个字段：

CREATE TEMP TABLE tt AS
SELECT
    1 AS id,
    [
      STRUCT('hair' as name, 'brown' AS str_value, null as int_value),
      STRUCT('eyes' as name, 'brown' AS str_value, null as int_value)
    ] AS meta;

INSERT INTO tt
SELECT
    2 AS id,
    [
      STRUCT('hair' as name, 'brown' AS str_value, null as int_value),
      STRUCT('weight' as name, cast(null as string) AS str_value, 160 as int_value)
    ] AS meta;

select * from tt

Note, that the default data type is int64 (in case you're not explicit using null )请注意，默认数据类型是 int64 （如果您没有明确使用null ）

Answer 2

Assuming your have two sets of data假设您有两组数据

table_1表格1

and table_2和 table_2

Consider below approach考虑以下方法

create temp function json_extract_keys(input string) returns array<string> language js as """
  return Object.keys(JSON.parse(input));
  """;
create temp function json_extract_values(input string) returns array<string> language js as """
  return Object.values(JSON.parse(input));
  """;

create temp table temp_table as (
  select id, key, value
  from (
    select id, to_json_string(meta) json from table_1 
    union all
    select id, to_json_string(meta) from table_2 
  ), unnest(json_extract_keys(json)) key with offset
  join unnest(json_extract_values(json)) value with offset
  using(offset)
  );

execute immediate(select '''
select id, struct(''' || string_agg(distinct key, ',') || ''') meta from temp_table
pivot (any_value(value) for key in ("''' || string_agg(distinct key, '","') || '"))'
from temp_table
);

with output与 output

如何在 BigQuery 中自动添加 STRUCT 元素

问题描述

2 个解决方案

解决方案1
1 2022-08-27 04:10:17

解决方案2
0 2022-08-27 16:08:55

如何在 BigQuery 中自动添加 STRUCT 元素

问题描述

2 个解决方案

解决方案1 1 2022-08-27 04:10:17

解决方案2 0 2022-08-27 16:08:55

解决方案1
1 2022-08-27 04:10:17

解决方案2
0 2022-08-27 16:08:55