[英]Multi-line JSON file querying in hive
I understand that the majority of JSON SerDe formats expect .json
files to be stored with one record per line.据我所知,大多数的JSON格式SERDE期望
.json
文件存储,每行一个记录。
I have an S3 bucket with multi-line indented .json
files (don't control the source) that I'd like to query using Amazon Athena (though I suppose this applies just as well to Hive generally).我有一个带有多行缩进
.json
文件(不控制源)的 S3 存储桶,我想使用 Amazon Athena 查询(尽管我认为这通常也适用于 Hive)。
.json
files?.json
文件的 SerDe 格式? Example file body:示例文件体:
[
{
"id": 1,
"name": "ryan",
"stuff: {
"x": true,
"y": [
123,
456
]
},
},
...
]
There is unfortunately no serde that supports multiline JSON content.不幸的是,没有支持多行 JSON 内容的 serde。 There is the specialized CloudTrail serde that supports a format similar to yours, but it's hard-coded only for the CloudTrail JSON format – but at least it shows that it's at least theoretically possible.
有专门的 CloudTrail serde 支持与您的格式类似的格式,但它仅针对 CloudTrail JSON 格式进行了硬编码 - 但至少它表明它至少在理论上是可能的。 Currently there is no way to write your own serdes to use with Athena, though.
但是,目前无法编写自己的 serdes 以与 Athena 一起使用。
You won't be able to consume these files with Athena, you will have to use EMR, Glue, or some other tool to reformat them into JSON stream files first.您将无法使用 Athena 使用这些文件,您必须先使用 EMR、Glue 或其他一些工具将它们重新格式化为 JSON 流文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.