简体   繁体   中英

Unnesting map values as individual columns in Athena / presto

My question is somewhat similar to this ( Athena/Presto - UNNEST MAP to columns ). But in my case, I know what columns I need before hand.

My use case is this

I have a json blob which contains the following structures

{
  "reqId" : "1234",
  "clientId" : "client",
  "response" : [
                 {
                   "name" : "Susan",
                   "projects" : [
                       {
                          "name" : "project1",
                          "completed" : true
                       },
                       {
                          "name" : "project2",
                          "completed" : false
                       }
                   ]
                 },
                 {
                   "name" : "Adams",
                   "projects" : [
                       {
                          "name" : "project1",
                          "completed" : true
                       },
                       {
                          "name" : "project2",
                          "completed" : false
                       }
                   ]
                 }
               ]
}

I need to create a view which will return output something like this

    name  |  project    |  Completed |
----------+-------------+------------+
    Susan |  project1   |   true     |
    Susan |  project2   |   false    |
    Adams |  project1   |   true     |
    Adams |  project2   |   false    |

I tried the following and other approaches. This one was the closest I can get

WITH dataset AS (
  SELECT 'Susan' as name, transform(filter(CAST(json_extract('{
           "projects": [{"name":"project1", "completed":false}, {"name":"project3", "completed":false},
           {"name":"project2", "completed":true}]}', '$.projects') AS ARRAY<MAP<VARCHAR, VARCHAR>>), p -> (p['name'] != 'project1')), p -> ROW(map_values(p))) AS projects
)
SELECT * from dataset
CROSS JOIN UNNEST(projects)

This is the output I am getting


    name    projects                                                        _col2
1   Susan   [{field0=[project3, false]}, {field0=[project2, true]}] {field0=[project3, false]}
2   Susan   [{field0=[project3, false]}, {field0=[project2, true]}] {field0=[project2, true]}

I basically want to unnest the key-value pairs of my map as separate columns. How do I do this in presto / Athena?

Your JSON example seems to be invalid, it misses a , after "name": "Susan" and "name": "Adams" . Besides that, you can achieve your expected output by this query, you need to UNNEST two times and also requires some casting:

with dataset as
(
    select json_parse('{"reqId" : "1234","clientId" : "client","response" : [{"name" : "Susan","projects" : [{"name" : "project1","completed" : true},{"name" : "project2","completed" : false}]},{"name" : "Adams","projects" : [{"name" : "project1","completed" : true},{"name" : "project2","completed" : false}]}]}') as json_col
)
,unnest_response as
(
    select * 
    from dataset
    cross join UNNEST(cast(json_extract(json_col, '$.response') as array<JSON>)) as t (response)
)
select 
json_extract_scalar(response, '$.name') name,
json_extract_scalar(project, '$.name') project_name,
json_extract_scalar(project, '$.completed') project_completed
from unnest_response
cross join UNNEST(cast(json_extract(response, '$.projects') as array<JSON>)) as t (project);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM