简体   繁体   中英

Create Table in Athena From Nested JSON

I have nested JSON of type

[{
    "emails": [{
        "label": "",
        "primary": "",
        "relationdef_id": "",
        "type": "",
        "value": ""
    }],
    "licenses": [{
        "allocated": "",
        "parent_type": "",
        "parentid": "",
        "product_type": "",
        "purchased_license_id": "",
        "service_type": ""
    }, {
        "allocated": "",
        "parent_type": "",
        "parentid": "",
        "product_type": "",
        "purchased_license_id": "",
        "service_type": ""
    }]
}, {
    "emails": [{
        "label": "",
        "primary": "",
        "relationdef_id": "",
        "type": "",
        "value": ""
    }],
    "licenses": [{
        "allocated": "2016-04-26 01:46:26",
        "parent_type": "",
        "parentid": "",
        "product_type": "",
        "purchased_license_id": "",
        "service_type": ""
    }]
}]

which is not able to be converted to athena table.

I have tried to update it to list of objects also

{
        "emails": [{
                "label": "",
                "primary": "",
                "relationdef_id": "",
                "type": "",
                "value": ""
            }
        ],
        "licenses": [{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            },{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            }
        ]
    }
    {
        "emails": [{
                "label": "",
                "primary": "",
                "relationdef_id": "",
                "type": "",
                "value": ""
            }
        ],
        "licenses": [{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            }
        ]
    }
    {
        "emails": [{
                "label": "",
                "primary": "",
                "relationdef_id": "",
                "type": "",
                "value": ""
            }
        ],
        "licenses": [{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            }
        ]
    }

with Query:

CREATE EXTERNAL TABLE `test_orders1`(
  `emails` array<struct<`label`: string, `primary`: string,`relationdef_id`: string,`type`: string, `value`: string>>,
  `licenses` array<struct<`allocated`: string, `parent_type`: string, `parentid`: string, `product_type`: string,`purchased_license_id`: string, `service_type`: string>>) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

but only 1 row is formed. Is there a way where i can use Nested json of type JSONArray into Athena table? Or how can I change Nested Json that will work for me?

When querying JSON data Athena requires the files to be formatted with one JSON document per line. It's unclear from your question if this is the case or not, the examples you give are multiline, but perhaps that's only to make the question more clear.

The table DDL you include looks like it should work on the second example data, provided that it is formatted as one document per line, eg

{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}, { "allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}
{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}
{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM