简体   繁体   English

如何在 BigQuery 中自动添加 STRUCT 元素

[英]How to Automatically Add STRUCT Elements in BigQuery

I have a BigQuery table with a STRUCT field and I would like to be able to automatically increase its list of elements whenever I attempt to insert a previously unseen element.我有一个带有 STRUCT 字段的 BigQuery 表,我希望能够在尝试插入以前未见过的元素时自动增加其元素列表。 Is this possible?这可能吗?

-- initially meta only has elements: hair, eyes
CREATE TEMP TABLE tt AS
SELECT
  1 AS id,
  STRUCT (
    'brown' AS hair,
    'brown' AS eyes
  ) AS meta;

-- now I would like to add a neverbefore seen element: weight
INSERT INTO tt
SELECT
  2 AS id,
  STRUCT (
    'brown' AS hair,
    160 AS weight
  ) AS meta;

This obviously does not work and returns the error Query column 2 has type STRUCT<hair STRING, weight INT64> which cannot be inserted into column meta, which has type STRUCT<hair STRING, eyes STRING> at [10:1] .这显然不起作用并返回错误Query column 2 has type STRUCT<hair STRING, weight INT64> which cannot be inserted into column meta, which has type STRUCT<hair STRING, eyes STRING> at [10:1]

The resulting temp table looks like the following after the initial construction:初始构造后生成的临时表如下所示:

id ID meta.hair元头发 meta.eyes元眼
1 1 brown棕色的 brown棕色的

And then it would ideally automatically add the element "weight" to meta after inserting row 2:然后理想情况下,它会在插入第 2 行后自动将元素“权重”添加到 meta:

id ID meta.hair元头发 meta.eyes元眼 meta.weight元权重
1 1 brown棕色的 brown棕色的 NULL NULL
2 2 brown棕色的 NULL NULL 160 160

This is probably wishful thinking.这大概是一厢情愿。

As a real-world example, I know that Stitch 's Webhook --> BigQuery integration is somehow achieving this behavior when it syncs data from some of our SaaS products into BigQuery.作为一个真实的例子,我知道StitchWebhook --> BigQuery 集成在将我们的一些 SaaS 产品中的数据同步到 BigQuery 时以某种方式实现了这种行为。 Stitch handles new, never-seen-before nested fields inside JSON payloads by adding new elements to corresponding STRUCT fields. Stitch 通过将新元素添加到相应的 STRUCT 字段来处理 JSON 有效负载中新的、从未见过的嵌套字段。 I am just not sure how this magic is happening.我只是不确定这种魔法是如何发生的。

One way of doing this is with key-value fields in an array.一种方法是使用数组中的键值字段。 Instead of naming the fields directly, you add a name field and a field for each data type you need:您无需直接命名字段,而是为您需要的每种数据类型添加一个名称字段和一个字段:

CREATE TEMP TABLE tt AS
SELECT
    1 AS id,
    [
      STRUCT('hair' as name, 'brown' AS str_value, null as int_value),
      STRUCT('eyes' as name, 'brown' AS str_value, null as int_value)
    ] AS meta;

INSERT INTO tt
SELECT
    2 AS id,
    [
      STRUCT('hair' as name, 'brown' AS str_value, null as int_value),
      STRUCT('weight' as name, cast(null as string) AS str_value, 160 as int_value)
    ] AS meta;

select * from tt

Note, that the default data type is int64 (in case you're not explicit using null )请注意,默认数据类型是 int64 (如果您没有明确使用null

Assuming your have two sets of data假设您有两组数据

table_1表格1
在此处输入图像描述

and table_2和 table_2
在此处输入图像描述

Consider below approach考虑以下方法

create temp function json_extract_keys(input string) returns array<string> language js as """
  return Object.keys(JSON.parse(input));
  """;
create temp function json_extract_values(input string) returns array<string> language js as """
  return Object.values(JSON.parse(input));
  """;

create temp table temp_table as (
  select id, key, value
  from (
    select id, to_json_string(meta) json from table_1 
    union all
    select id, to_json_string(meta) from table_2 
  ), unnest(json_extract_keys(json)) key with offset
  join unnest(json_extract_values(json)) value with offset
  using(offset)
  );

execute immediate(select '''
select id, struct(''' || string_agg(distinct key, ',') || ''') meta from temp_table
pivot (any_value(value) for key in ("''' || string_agg(distinct key, '","') || '"))'
from temp_table
); 

with output与 output

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM