简体   繁体   中英

How to Automatically Add STRUCT Elements in BigQuery

I have a BigQuery table with a STRUCT field and I would like to be able to automatically increase its list of elements whenever I attempt to insert a previously unseen element. Is this possible?

-- initially meta only has elements: hair, eyes
CREATE TEMP TABLE tt AS
SELECT
  1 AS id,
  STRUCT (
    'brown' AS hair,
    'brown' AS eyes
  ) AS meta;

-- now I would like to add a neverbefore seen element: weight
INSERT INTO tt
SELECT
  2 AS id,
  STRUCT (
    'brown' AS hair,
    160 AS weight
  ) AS meta;

This obviously does not work and returns the error Query column 2 has type STRUCT<hair STRING, weight INT64> which cannot be inserted into column meta, which has type STRUCT<hair STRING, eyes STRING> at [10:1] .

The resulting temp table looks like the following after the initial construction:

id meta.hair meta.eyes
1 brown brown

And then it would ideally automatically add the element "weight" to meta after inserting row 2:

id meta.hair meta.eyes meta.weight
1 brown brown NULL
2 brown NULL 160

This is probably wishful thinking.

As a real-world example, I know that Stitch 's Webhook --> BigQuery integration is somehow achieving this behavior when it syncs data from some of our SaaS products into BigQuery. Stitch handles new, never-seen-before nested fields inside JSON payloads by adding new elements to corresponding STRUCT fields. I am just not sure how this magic is happening.

One way of doing this is with key-value fields in an array. Instead of naming the fields directly, you add a name field and a field for each data type you need:

CREATE TEMP TABLE tt AS
SELECT
    1 AS id,
    [
      STRUCT('hair' as name, 'brown' AS str_value, null as int_value),
      STRUCT('eyes' as name, 'brown' AS str_value, null as int_value)
    ] AS meta;

INSERT INTO tt
SELECT
    2 AS id,
    [
      STRUCT('hair' as name, 'brown' AS str_value, null as int_value),
      STRUCT('weight' as name, cast(null as string) AS str_value, 160 as int_value)
    ] AS meta;

select * from tt

Note, that the default data type is int64 (in case you're not explicit using null )

Assuming your have two sets of data

table_1
在此处输入图像描述

and table_2
在此处输入图像描述

Consider below approach

create temp function json_extract_keys(input string) returns array<string> language js as """
  return Object.keys(JSON.parse(input));
  """;
create temp function json_extract_values(input string) returns array<string> language js as """
  return Object.values(JSON.parse(input));
  """;

create temp table temp_table as (
  select id, key, value
  from (
    select id, to_json_string(meta) json from table_1 
    union all
    select id, to_json_string(meta) from table_2 
  ), unnest(json_extract_keys(json)) key with offset
  join unnest(json_extract_values(json)) value with offset
  using(offset)
  );

execute immediate(select '''
select id, struct(''' || string_agg(distinct key, ',') || ''') meta from temp_table
pivot (any_value(value) for key in ("''' || string_agg(distinct key, '","') || '"))'
from temp_table
); 

with output

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM