[英]How to flatten a colum of large JSON strings with different numbers of keys to a table in bigquery
I have a google bigquery table with a column containing large JSON strings.我有一个 google bigquery 表,其中有一列包含大型 JSON 字符串。 In each row, there is a different number of keys and nested keys that I would like to flatten into columns.
在每一行中,有不同数量的键和嵌套键,我想将它们展平成列。
My table looks as follows:我的表如下所示:
id ![]() |
payload![]() |
---|---|
1 ![]() |
{"key1":{"value":"1"},"key2":2,"key3":1,"key4":"abcde,"version":10} ![]() |
2 ![]() |
{"key1":{"value":"2"},"key2":5,"key3":2,"key4":"defg,"version":11} ![]() |
I have managed to extract single columns by using the bq functions JSON_EXTRACT_VALUE and/or JSON_EXTRACT_SCALAR:我已经设法通过使用 bq 函数 JSON_EXTRACT_VALUE 和/或 JSON_EXTRACT_SCALAR 来提取单个列:
SELECT id, JSON_EXTRACT_VALUE(payload, '$.key1') as key1
FROM `project.dataset.table`
etc., however I don't want to hand code more than 100 keys which are nested in the JSON column.等等,但是我不想编写超过 100 个嵌套在 JSON 列中的键。 There has to be a better way!
一定有更好的方法!
I am grateful for any kind of support!我很感激任何形式的支持!
Consider below approach考虑以下方法
create temp function extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
create temp function extract_all_leaves(input string) returns string language js as '''
function flattenObj(obj, parent = '', res = {}){
for(let key in obj){
let propName = parent ? parent + '.' + key : key;
if(typeof obj[key] == 'object'){
flattenObj(obj[key], propName, res);
} else {
res[propName] = obj[key];
}
}
return JSON.stringify(res);
}
return flattenObj(JSON.parse(input));
''';
create temp table temp_table as (
select offset, key, value, id
from your_table t,
unnest([struct(extract_all_leaves(payload) as leaves)]),
unnest(extract_keys(leaves)) key with offset
join unnest(extract_values(leaves)) value with offset
using(offset)
);
execute immediate (select '''
select * from (select * except(offset) from temp_table)
pivot (any_value(value) for replace(key, '.', '__') in (''' || keys_list || '''
))'''
from (select string_agg('"' || replace(key, '.', '__') || '"', ',' order by offset) keys_list from (
select key, min(offset) as offset from temp_table group by key
))
);
if applied to sample data as in your question如果应用于您的问题中的示例数据
the output is output 是
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.