[英]How to retrieve the list of dynamic nested keys of BigQuery nested records
My ELT tools imports my data in bigquery and generates/extends automatically the schema for dynamic nested keys (in the schema below, under properties
)我的 ELT 工具将我的数据导入 bigquery 并自动生成/扩展动态嵌套键的模式(在下面的模式中,在
properties
下)
It looks like this看起来像这样
How can I get the list of nested keys of a repeated record?如何获取重复记录的嵌套键列表? so for example I can group by properties when those items have said property non-null?
因此,例如,当这些项目表示属性非空时,我可以按属性分组吗?
I have tried我努力了
select column_name
from my_schema.INFORMATION_SCHEMA.COLUMNS
where
table_name = 'my_table
But it will only list first level keys但它只会列出一级键
From the picture above, I want, as a first step, a SQL query that returns从上图中,作为第一步,我想要一个返回 SQL 的查询
message
user_id
seeker
liker_id
rateable_id
rateable_type
from_organization
likeable_type
company
existing_attempt
...
My real goal through, is to group/count my data based on a non-null value of a 2nd level nested properties properties.filters.[filter_type]
我的真正目标是根据第二级嵌套属性
properties.filters.[filter_type]
的非空值对我的数据进行分组/计数
The schema may evolve when our application adds more filters, so this need to be dynamically generated, I can't just hard-code the list of nested keys.当我们的应用程序添加更多过滤器时,架构可能会发生变化,因此这需要动态生成,我不能只对嵌套键列表进行硬编码。
Note: this is very similar to this question How to extract all the keys in a JSON object with BigQuery but in my case my data is already in a shcema and it's not a JSON object注意:这与这个问题非常相似How to extract all the keys in a JSON object with BigQuery但在我的例子中,我的数据已经在 shcema 中,它不是 JSON object
EDIT:编辑:
Suppose I have a list of such records with nested properties, how do I write a SQL query that adds a field "enabled_filters" which aggregates, for each item, the list of properties for wihch said property is not null?假设我有一个具有嵌套属性的此类记录的列表,我如何编写一个 SQL 查询来添加一个字段“enabled_filters”,该字段聚合每个项目的属性列表,而不是 null?
Example input (properties.x are dynamic and not known by the programmer)输入示例(properties.x 是动态的,程序员不知道)
search_id ![]() |
properties.filters.school![]() |
properties.filters.type![]() |
---|---|---|
1 ![]() |
MIT![]() |
master![]() |
2 ![]() |
Princetown![]() |
null ![]() |
3 ![]() |
null ![]() |
master![]() |
Example output示例 output
search_id ![]() |
enabled_filters ![]() |
---|---|
1 ![]() |
["school", "type"] ![]() |
2 ![]() |
["school"] ![]() |
3 ![]() |
["type"] ![]() |
The field properties
is not nested by array only by structures.字段
properties
不是仅按结构嵌套在数组中。 Then a UDF in JavaScript to parse thise field should work fast enough.然后 JavaScript 中的一个 UDF 来解析这个字段应该足够快。
CREATE TEMP FUNCTION jsonObjectKeys(input STRING, shownull BOOL,fullname Bool)
RETURNS Array<String>
LANGUAGE js AS """
function test(input,old){
var out=[]
for(let x in input){
let te=input[x];
out=out.concat(te==null ? (shownull?[x+'==null']:[]) : typeof te=='object' ? test(te,old+x+'.') : [fullname ? old+x : x] );
}
return out;
Object.keys(JSON.parse(input));
}
return test(JSON.parse(input),"");
""";
with tbl as (select struct(1 as alpha,struct(2 as x, 3 as y,[1,2,3] as z ) as B) A from unnest(generate_array(1,10*1))
union all select struct(null,struct(null,1,[999])) )
select *,
TO_JSON_STRING (A ) as string_output,
jsonObjectKeys(TO_JSON_STRING (A),true,false) as output1,
jsonObjectKeys(TO_JSON_STRING (A),false,true) as output2,
concat('["', array_to_string(jsonObjectKeys(TO_JSON_STRING (A),false,true),'","' ) ,'"]') as output_sring,
jsonObjectKeys(TO_JSON_STRING (A.B),false,true) as outpu
from tbl
Have you looked at COLUMN_FIELD_PATHS
?你看过
COLUMN_FIELD_PATHS
吗? It should give you the paths for all columns.它应该为您提供所有列的路径。
select field_path from my_schema.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS where table_name = '<table>'
[https://cloud.google.com/bigquery/docs/information-schema-column-field-paths] [https://cloud.google.com/bigquery/docs/information-schema-column-field-paths]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.