简体   繁体   English

如何检索 BigQuery 嵌套记录的动态嵌套键列表

[英]How to retrieve the list of dynamic nested keys of BigQuery nested records

My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties )我的 ELT 工具将我的数据导入 bigquery 并自动生成/扩展动态嵌套键的模式(在下面的模式中,在properties下)

It looks like this看起来像这样

在此处输入图像描述

How can I get the list of nested keys of a repeated record?如何获取重复记录的嵌套键列表? so for example I can group by properties when those items have said property non-null?因此,例如,当这些项目表示属性非空时,我可以按属性分组吗?

I have tried我努力了

    select column_name
    from my_schema.INFORMATION_SCHEMA.COLUMNS
    where
        table_name = 'my_table
        

But it will only list first level keys但它只会列出一级键

From the picture above, I want, as a first step, a SQL query that returns从上图中,作为第一步,我想要一个返回 SQL 的查询

message
user_id
seeker 
liker_id 
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt 
...

My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]我的真正目标是根据第二级嵌套属性properties.filters.[filter_type]的非空值对我的数据进行分组/计数

The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.当我们的应用程序添加更多过滤器时,架构可能会发生变化,因此这需要动态生成,我不能只对嵌套键列表进行硬编码。

Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object注意:这与这个问题非常相似How to extract all the keys in a JSON object with BigQuery但在我的例子中,我的数据已经在 shcema 中,它不是 JSON object

EDIT:编辑:

Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null?假设我有一个具有嵌套属性的此类记录的列表,我如何编写一个 SQL 查询来添加一个字段“enabled_filters”,该字段聚合每个项目的属性列表,而不是 null?

Example input (properties.x are dynamic and not known by the programmer)输入示例(properties.x 是动态的,程序员不知道)

search_id search_id properties.filters.school属性.filters.school properties.filters.type属性.filters.type
1 1个 MIT麻省理工学院 master掌握
2 2个 Princetown王子镇 null null
3 3个 null null master掌握

Example output示例 output

search_id search_id enabled_filters enabled_filters
1 1个 ["school", "type"] [“学校”,“类型”]
2 2个 ["school"] [“学校”]
3 3个 ["type"] [“类型”]

The field properties is not nested by array only by structures.字段properties不是仅按结构嵌套在数组中。 Then a UDF in JavaScript to parse thise field should work fast enough.然后 JavaScript 中的一个 UDF 来解析这个字段应该足够快。

CREATE TEMP FUNCTION jsonObjectKeys(input STRING, shownull BOOL,fullname Bool)
RETURNS Array<String>
LANGUAGE js AS """
function test(input,old){
  var out=[]
  for(let x in input){
    let te=input[x];
    out=out.concat(te==null ? (shownull?[x+'==null']:[]) : typeof te=='object' ? test(te,old+x+'.') : [fullname ? old+x : x]  );
  }
  return  out;
  Object.keys(JSON.parse(input));

}
return test(JSON.parse(input),"");
""";
with tbl as (select struct(1 as alpha,struct(2 as x, 3 as y,[1,2,3] as z ) as B) A from unnest(generate_array(1,10*1))
 union all select struct(null,struct(null,1,[999])) ) 
select *,
TO_JSON_STRING (A ) as string_output,
jsonObjectKeys(TO_JSON_STRING (A),true,false) as output1,
jsonObjectKeys(TO_JSON_STRING (A),false,true) as output2,
concat('["', array_to_string(jsonObjectKeys(TO_JSON_STRING (A),false,true),'","' ) ,'"]') as output_sring,
jsonObjectKeys(TO_JSON_STRING (A.B),false,true) as outpu

 from tbl

Have you looked at COLUMN_FIELD_PATHS ?你看过COLUMN_FIELD_PATHS吗? It should give you the paths for all columns.它应该为您提供所有列的路径。

select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'

[https://cloud.google.com/bigquery/docs/information-schema-column-field-paths] [https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM