简体   繁体   中英

JSON Flattening and table creation

Could somebody help me to create an SQL statement to flatten JSON data in Snowflake Table1 table, in one JSON_DATA column that has an array?

JSON Data

{
  "id": "1234-567-890",
  "parent_id": "00-123-safsf-3345",
  "data": [
    {
      "id": "sfsfd-234-fgf-55-4545",
      "values": [
        {
          "name": "one",
          "value": "32"
        },
        {
          "name": "Two",
          "value": "MMAD"
        },
        {
          "name": "three",
          "value": ""
        },
        {
          "name": "four",
          "value": "Bacra-Dacra"
        },
        {
          "name": "five",
          "value": "33-5455-9"
        },
        {
          "name": "six",
          "value": ""
        },
        {
          "name": "seven",
          "value": "4056"
        },
        {
          "name": "eight",
          "value": "TUU-WWW"
        },
        {
          "name": "nine",
          "value": ""
        },
        {
          "name": "ten",
          "value": "234234"
        }
      ]
    },
    {
      "id": "asdfsdfsdf-23423-fsff-3445435",
      "values": [
        {
          "name": "One",
          "value": "32"
        },
        {
          "name": "Two",
          "value": "MMDI"
        },
        {
          "name": "Three",
          "value": ""
        },
        {
          "name": "four",
          "value": "THis is a Test"
        },
        {
          "name": "five",
          "value": "11-4543535-2"
        },
        {
          "name": "six",
          "value": ""
        },
        {
          "name": "seven",
          "value": "4056"
        },
        {
          "name": "eight",
          "value": "ert erte"
        },
        {
          "name": "nine",
          "value": ""
        },
        {
          "name": "ten",
          "value": "343534"
        }
      ]
    }
  ]
}

Table Format required:

id one two three four five six seven eight nine ten
sfsfd-234-fgf-55-4545 32 MMAD :--: Bacra-Dacra 33-5455-9 4056 TUU-WWW 234234
asdfsdfsdf-23423-fsff-3445435 32 MMDI :--: THis is a Test 11-4543535-2 4056 ert erte 343534

You can do this with a couple of flattens and a pivot. You need to know how many columns you want to pivot beforehand. In your example data, both records only have 10 columns but you will need to update the pivot statement if some records contain more.

I think you made a mistake with your results table. You've missed out key "six" from the json which has pushed your results back a by 1. For example, the last column "ten" should contain the numbers 234234 and 343534 but you have them in column "nine". Same deal for columns after 5.

Here is reproducible example setup:

-- create example source table
create or replace table source_table
(
    json_data variant
);

-- create example target table
create or replace table target_table
(
    id    number,
    one   varchar,
    two   varchar,
    three varchar,
    four  varchar,
    five  varchar,
    six   varchar,
    seven varchar,
    eight varchar,
    nine  varchar,
    ten   varchar
);

-- Insert json data into source table
insert overwrite into source_table (json_data)
select
    parse_json('{
  "id": "1234-567-890",
  "parent_id": "00-123-safsf-3345",
  "data": [
    {
      "id": "sfsfd-234-fgf-55-4545",
      "values": [
        {
          "name": "one",
          "value": "32"
        },
        {
          "name": "Two",
          "value": "MMAD"
        },
        {
          "name": "three",
          "value": ""
        },
        {
          "name": "four",
          "value": "Bacra-Dacra"
        },
        {
          "name": "five",
          "value": "33-5455-9"
        },
        {
          "name": "six",
          "value": ""
        },
        {
          "name": "seven",
          "value": "4056"
        },
        {
          "name": "eight",
          "value": "TUU-WWW"
        },
        {
          "name": "nine",
          "value": ""
        },
        {
          "name": "ten",
          "value": "234234"
        }
      ]
    },
    {
      "id": "asdfsdfsdf-23423-fsff-3445435",
      "values": [
        {
          "name": "One",
          "value": "32"
        },
        {
          "name": "Two",
          "value": "MMDI"
        },
        {
          "name": "Three",
          "value": ""
        },
        {
          "name": "four",
          "value": "THis is a Test"
        },
        {
          "name": "five",
          "value": "11-4543535-2"
        },
        {
          "name": "six",
          "value": ""
        },
        {
          "name": "seven",
          "value": "4056"
        },
        {
          "name": "eight",
          "value": "ert erte"
        },
        {
          "name": "nine",
          "value": ""
        },
        {
          "name": "ten",
          "value": "343534"
        }
      ]
    }
  ]
}');

select *
from (
 select
     st.json_data:id::varchar         as main_id,
     st.json_data:parent_id::varchar  as parent_id,
     data.value:id::varchar           as id,
     upper(vals.value: name::varchar) as col_name,
     vals.value: value::varchar       as col_value
 from source_table st,
      lateral flatten(input => json_data: data) data,
      lateral flatten(input => data.value: values) vals
 )
 pivot (max(col_value) for col_name in ('ONE', 'TWO', 'THREE', 'FOUR', 'FIVE', 'SIX', 'SEVEN', 'EIGHT', 'NINE', 'TEN'))

The above produces results that look like this:

MAIN_ID PARENT_ID ID 'ONE' 'TWO' 'THREE' 'FOUR' 'FIVE' 'SIX' 'SEVEN' 'EIGHT' 'NINE' 'TEN'
1234-567-890 00-123-safsf-3345 asdfsdfsdf-23423-fsff-3445435 32 MMDI THis is a Test 11-4543535-2 4056 ert erte 343534
1234-567-890 00-123-safsf-3345 sfsfd-234-fgf-55-4545 32 MMAD Bacra-Dacra 33-5455-9 4056 TUU-WWW 234234

add your json to S3 or other storage layer or inline, create stage and try below method

    create or replace table DATABASE.STAGE.jsonSrc
(
    src variant
)
as 
select parse_json($1) as src
from DATABASE.STAGE.json_flatten f;

select src:id as ID, 
src:parent_id as P_ID,
src:data.id as DATA_ID,
src:data.values.name as DV_NAME
from DATABASE.STAGE.jsonSrc;

you can simply traverse through the json by '.' notation and passing attributes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM