简体   繁体   English

从嵌套的 JSON 在 Athena 中创建表

[英]Create Table in Athena From Nested JSON

I have nested JSON of type我有嵌套的 JSON 类型

[{
    "emails": [{
        "label": "",
        "primary": "",
        "relationdef_id": "",
        "type": "",
        "value": ""
    }],
    "licenses": [{
        "allocated": "",
        "parent_type": "",
        "parentid": "",
        "product_type": "",
        "purchased_license_id": "",
        "service_type": ""
    }, {
        "allocated": "",
        "parent_type": "",
        "parentid": "",
        "product_type": "",
        "purchased_license_id": "",
        "service_type": ""
    }]
}, {
    "emails": [{
        "label": "",
        "primary": "",
        "relationdef_id": "",
        "type": "",
        "value": ""
    }],
    "licenses": [{
        "allocated": "2016-04-26 01:46:26",
        "parent_type": "",
        "parentid": "",
        "product_type": "",
        "purchased_license_id": "",
        "service_type": ""
    }]
}]

which is not able to be converted to athena table.无法转换为雅典娜表。

I have tried to update it to list of objects also我也尝试将其更新为对象列表

{
        "emails": [{
                "label": "",
                "primary": "",
                "relationdef_id": "",
                "type": "",
                "value": ""
            }
        ],
        "licenses": [{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            },{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            }
        ]
    }
    {
        "emails": [{
                "label": "",
                "primary": "",
                "relationdef_id": "",
                "type": "",
                "value": ""
            }
        ],
        "licenses": [{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            }
        ]
    }
    {
        "emails": [{
                "label": "",
                "primary": "",
                "relationdef_id": "",
                "type": "",
                "value": ""
            }
        ],
        "licenses": [{
                "allocated": "",
                "parent_type": "",
                "parentid": "",
                "product_type": "",
                "purchased_license_id": "",
                "service_type": ""
            }
        ]
    }

with Query:带查询:

CREATE EXTERNAL TABLE `test_orders1`(
  `emails` array<struct<`label`: string, `primary`: string,`relationdef_id`: string,`type`: string, `value`: string>>,
  `licenses` array<struct<`allocated`: string, `parent_type`: string, `parentid`: string, `product_type`: string,`purchased_license_id`: string, `service_type`: string>>) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

but only 1 row is formed.但只形成了 1 行。 Is there a way where i can use Nested json of type JSONArray into Athena table?有没有办法可以将 JSONArray 类型的嵌套 json 使用到 Athena 表中? Or how can I change Nested Json that will work for me?或者如何更改对我有用的嵌套 Json?

When querying JSON data Athena requires the files to be formatted with one JSON document per line.查询 JSON 数据时,Athena 要求将文件格式化为每行一个 JSON 文档。 It's unclear from your question if this is the case or not, the examples you give are multiline, but perhaps that's only to make the question more clear.从您的问题中不清楚是否是这种情况,您提供的示例是多行的,但这也许只是为了使问题更清楚。

The table DDL you include looks like it should work on the second example data, provided that it is formatted as one document per line, eg您包含的表 DDL 看起来应该适用于第二个示例数据,前提是它的格式为每行一个文档,例如

{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}, { "allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}
{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}
{"emails": [{"label": "", "primary": "", "relationdef_id": "", "type": "", "value": ""}], "licenses": [{"allocated": "", "parent_type": "", "parentid": "", "product_type": "", "purchased_license_id": "", "service_type": ""}]}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM