简体   繁体   English

在单独的行上返回 Redshift JSON 数组的元素

[英]Return elements of Redshift JSON array on separate rows

I have a Redshift table that looks like this:我有一个看起来像这样的 Redshift 表:

 id | metadata
---------------------------------------------------------------------------
 1  | [{"pet":"dog"},{"country":"uk"}]
 2  | [{"pet":"cat"}]
 3  | []
 4  | [{"country":"germany"},{"education":"masters"},{"country":"belgium"}]
  • All array elements have just one field.所有数组元素只有一个字段。
  • There is no guarantee that a particular field will feature in any of an array's elements.不能保证特定字段将出现在数组的任何元素中。
  • A field name can be repeated in an array字段名称可以在数组中重复
  • The array elements can be in any order数组元素可以是任意顺序

I am wanting to get back a table that looks like this:我想找回一个看起来像这样的表:

 id |   field   |  value
------------------------
 1  | pet       | dog
 1  | country   | uk
 2  | pet       | cat
 4  | country   | germany
 4  | education | masters
 4  | country   | belgium

I can then combine this with my queries on the rest of the input table.然后我可以将它与我对输入表的 rest 的查询结合起来。

I have tried playing around with the Redshift JSON functions, but without being able to write functions/use loops/have variables in Redshift, I really can't see a way to do this!我试过使用 Redshift JSON 函数,但是无法在 Redshift 中编写函数/使用循环/拥有变量,我真的看不到这样做的方法!

Please let me know if I can clarify anything else.如果我能澄清其他任何事情,请告诉我。

Thanks to this inspired blog post , I've been able to craft a solution. 感谢这篇受到启发的博客文章 ,我已经能够制作出一个解决方案。 This is: 这是:

  1. Create a look-up table to effectively 'iterate' over the elements of each array. 创建一个查找表,以有效地“迭代”每个数组的元素。 The number of rows in this table has be equal to or greater than the maximum number of elements of arrays. 此表中的行数等于或大于数组的最大元素数。 Let's say this is 4 (it can be calculated using SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table ): 假设这是4(可以使用SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table

     CREATE VIEW seq_0_to_3 AS SELECT 0 AS i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 ); 
  2. From this, we can create one row per JSON element: 从这里,我们可以为每个JSON元素创建一行:

     WITH exploded_array AS ( SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json FROM input_table, seq_0_to_3 AS seq WHERE seq.i < JSON_ARRAY_LENGTH(metadata) ) SELECT * FROM exploded_array; 

    Producing: 生产:

      id | json ------------------------------ 1 | {"pet":"dog"} 1 | {"country":"uk"} 2 | {"pet":"cat"} 4 | {"country":"germany"} 4 | {"education":"masters"} 4 | {"country":"belgium"} 
  3. However, I was needing to extract the field names/values. 但是,我需要提取字段名称/值。 As I can't see any way to extract JSON field names using Redshift's limited functions, I'll do this using a regular expression: 由于我看不到使用Redshift的有限函数提取JSON字段名称的任何方法,我将使用正则表达式执行此操作:

     WITH exploded_array AS ( SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json FROM input_table, seq_0_to_3 AS seq WHERE seq.i < JSON_ARRAY_LENGTH(metadata) ) SELECT id, field, JSON_EXTRACT_PATH_TEXT(json, field) FROM ( SELECT id, json, REGEXP_SUBSTR(json, '[^{"]\\\\w+[^"]') AS field FROM exploded_array ); 

It's now possible in Redshift to treat strings in either array format [] or json format {} as parsable json structures.现在可以在 Redshift 中将数组格式 [] 或 json 格式 {} 中的字符串视为可解析的 json 结构。 First let's make a temp table based on your data:首先让我们根据您的数据制作一个临时表:

create temporary table #t1 (id int, json_str varchar(100));
truncate table #t1;
insert into #t1 values (1, '[{"pet":"dog"},{"country":"uk"}]');
insert into #t1 values (2, '[{"pet":"cat"}]');
insert into #t1 values (3, '[]');
insert into #t1 values (4, '[{"country":"germany"},{"education":"masters"},{"country":"belgium"}]');

This creation of a common table expression (cte) will be used to implicitly convert the json_str field into a formal json structure of SUPER type.公共表表达式 (cte) 的创建将用于将 json_str 字段隐式转换为 SUPER 类型的正式 json 结构。 If the table's field were already SUPER type, we could skip this step.如果表的字段已经是 SUPER 类型,我们可以跳过这一步。

drop table if exists #t2;
create temporary table #t2 as
with cte as 
    (select 
        x.id,
        json_parse(x.json_str) as json_str -- convert string to SUPER structure 
    from
        #t1 x
    )
select
    x.id
    ,unnested
from
    cte x, x.json_str as unnested -- an alias of cte and x.json_str is required!
order by 
    id
;

Now we have an exploded list of key/value pairs to easily extract:现在我们有一个键/值对的分解列表,可以轻松提取:

select 
    t2.id
    ,json_key -- this is the extracted key
    ,cast(json_val as varchar) as json_val -- eleminates the double quote marks
from
    #t2 t2, unpivot t2.unnested as json_val at json_key --"at some_label" (e.g. json_key) will extract the key
order by
    id

A different way to render the info is to allow the parsing engine to turn keys into columns.呈现信息的另一种方法是允许解析引擎将键转换为列。 This isn't what you asked for, but potentially interesting:这不是您要求的,但可能很有趣:

select 
    id
    ,cast(t2.unnested.country as varchar) -- data is already parsed into rows, so it's directly addressable now
    ,cast(t2.unnested.education as varchar)
    ,cast(t2.unnested.pet as varchar)
from
    #t2 t2
;

If you want more info on this, use a search engine to search for parsing the SUPER data type.如果您想了解更多信息,请使用搜索引擎搜索解析 SUPER 数据类型。 If the data already existed as SUPER in the Redshift table, these latter 2 queries would work natively against the table, no need for a temp table.如果数据在 Redshift 表中已经作为 SUPER 存在,则后两个查询将在表中本地运行,不需要临时表。

There is generic version for CREATE VIEW seq_0_to_3 . CREATE VIEW seq_0_to_3有通用版本。 Let's call it CREATE VIEW seq_0_to_n . 我们称之为CREATE VIEW seq_0_to_n This can be generated by 这可以通过以下方式生成

CREATE VIEW seq_0_to_n AS (  
    SELECT row_number() over (
                          ORDER BY TRUE)::integer - 1 AS i
    FROM <insert_large_enough_table> LIMIT <number_less_than_table_entries>);

This helps in generating large sequences as a view. 这有助于生成大型序列作为视图。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM