简体   繁体   English

如何在bigquery中将具有不同键数的大型JSON字符串的列展平到表中

[英]How to flatten a colum of large JSON strings with different numbers of keys to a table in bigquery

I have a google bigquery table with a column containing large JSON strings.我有一个 google bigquery 表,其中有一列包含大型 JSON 字符串。 In each row, there is a different number of keys and nested keys that I would like to flatten into columns.在每一行中,有不同数量的键和嵌套键,我想将它们展平成列。

My table looks as follows:我的表如下所示:

id ID payload有效载荷
1 1 {"key1":{"value":"1"},"key2":2,"key3":1,"key4":"abcde,"version":10} {"key1":{"value":"1"},"key2":2,"key3":1,"key4":"abcde,"version":10}
2 2 {"key1":{"value":"2"},"key2":5,"key3":2,"key4":"defg,"version":11} {"key1":{"value":"2"},"key2":5,"key3":2,"key4":"defg,"version":11}

I have managed to extract single columns by using the bq functions JSON_EXTRACT_VALUE and/or JSON_EXTRACT_SCALAR:我已经设法通过使用 bq 函数 JSON_EXTRACT_VALUE 和/或 JSON_EXTRACT_SCALAR 来提取单个列:

SELECT id, JSON_EXTRACT_VALUE(payload, '$.key1') as key1
FROM `project.dataset.table`

etc., however I don't want to hand code more than 100 keys which are nested in the JSON column.等等,但是我不想编写超过 100 个嵌套在 JSON 列中的键。 There has to be a better way!一定有更好的方法!

I am grateful for any kind of support!我很感激任何形式的支持!

Consider below approach考虑以下方法

create temp function  extract_keys(input string) returns array<string> language js as """
  return Object.keys(JSON.parse(input));
  """;
create temp function  extract_values(input string) returns array<string> language js as """
  return Object.values(JSON.parse(input));
  """;
create temp function extract_all_leaves(input string) returns string language js as '''
  function flattenObj(obj, parent = '', res = {}){
    for(let key in obj){
        let propName = parent ? parent + '.' + key : key;
        if(typeof obj[key] == 'object'){
            flattenObj(obj[key], propName, res);
        } else {
            res[propName] = obj[key];
        }
    }
    return JSON.stringify(res);
  }
  return flattenObj(JSON.parse(input));
  ''';

create temp table temp_table as (
  select offset, key, value, id
  from your_table t, 
  unnest([struct(extract_all_leaves(payload) as leaves)]),
  unnest(extract_keys(leaves)) key with offset
  join unnest(extract_values(leaves)) value with offset
  using(offset) 
);  

execute immediate (select '''
  select * from (select * except(offset) from temp_table)
  pivot (any_value(value) for replace(key, '.', '__') in (''' || keys_list || '''
  ))'''
from (select string_agg('"' || replace(key, '.', '__') || '"', ',' order by offset) keys_list from (
  select key, min(offset) as offset from temp_table group by key
))
);     

if applied to sample data as in your question如果应用于您的问题中的示例数据

在此处输入图像描述

the output is output 是

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM