简体   繁体   中英

How to retrieve the list of dynamic nested keys of BigQuery nested records

My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties )

It looks like this

在此处输入图像描述

How can I get the list of nested keys of a repeated record? so for example I can group by properties when those items have said property non-null?

I have tried

    select column_name
    from my_schema.INFORMATION_SCHEMA.COLUMNS
    where
        table_name = 'my_table
        

But it will only list first level keys

From the picture above, I want, as a first step, a SQL query that returns

message
user_id
seeker 
liker_id 
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt 
...

My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]

The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.

Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object

EDIT:

Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null?

Example input (properties.x are dynamic and not known by the programmer)

search_id properties.filters.school properties.filters.type
1 MIT master
2 Princetown null
3 null master

Example output

search_id enabled_filters
1 ["school", "type"]
2 ["school"]
3 ["type"]

The field properties is not nested by array only by structures. Then a UDF in JavaScript to parse thise field should work fast enough.

CREATE TEMP FUNCTION jsonObjectKeys(input STRING, shownull BOOL,fullname Bool)
RETURNS Array<String>
LANGUAGE js AS """
function test(input,old){
  var out=[]
  for(let x in input){
    let te=input[x];
    out=out.concat(te==null ? (shownull?[x+'==null']:[]) : typeof te=='object' ? test(te,old+x+'.') : [fullname ? old+x : x]  );
  }
  return  out;
  Object.keys(JSON.parse(input));

}
return test(JSON.parse(input),"");
""";
with tbl as (select struct(1 as alpha,struct(2 as x, 3 as y,[1,2,3] as z ) as B) A from unnest(generate_array(1,10*1))
 union all select struct(null,struct(null,1,[999])) ) 
select *,
TO_JSON_STRING (A ) as string_output,
jsonObjectKeys(TO_JSON_STRING (A),true,false) as output1,
jsonObjectKeys(TO_JSON_STRING (A),false,true) as output2,
concat('["', array_to_string(jsonObjectKeys(TO_JSON_STRING (A),false,true),'","' ) ,'"]') as output_sring,
jsonObjectKeys(TO_JSON_STRING (A.B),false,true) as outpu

 from tbl

Have you looked at COLUMN_FIELD_PATHS ? It should give you the paths for all columns.

select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'

[https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM