使用 Python 将标准 JSON 文件转换为 json-serde 格式并上传到 Amazon Athena（Presto、Hive）的 AWS S3 存储桶

Question

我正在尝试将 json 文件转换为 json-serde 格式，使用 Python 将 json-serde 文件上传到 AWS S3 存储桶，以便 Amazon Athena (Presto/Hive) 可以读取 s3 存储桶中的文件。

根据 AWS 产品文档，典型的 json 文件不是有效格式； json 文件需要采用 json-serde 格式： https://docs.aws.amazon.com/athena/latest/ug/json-serde.ZFC35FDC70D5FC69D269883A822C7A53

在本地，我可以使用以下代码将 json 文件转换为 json-serde 格式：

import json
with open('xx_original_file.json','r',encoding='utf-8') as json_file:
    data = json.load(json_file)
result = [json.dumps(record) for record in data]
with open('xx_new_file.json', 'w') as obj:
    for i in result:
        obj.write(i+'\n')

在 Python 中是否有等效的方法可以让我将新的 json-serde 文件存储在 s3 存储桶中？ 到目前为止，我构建的 Python 脚本一直出错：

import json
import os
import boto3

s3 = boto3.client('s3')
bucket = 'my_bucket_name'
key = 'xx_original_file.json'
response = s3.get_object(Bucket=bucket,Key=key)
content = response['Body']
jsonObject = json.loads(content.read())
result = [json.dumps(record) for record in jsonObject]
new_results = []
for i in result:
    new_results.append(i+'\n')
new_key = 'xx_new_file.json'
s3.put_object(Bucket=bucket,Key=new_key,Body=new_results)

错误消息：ParamValidationError：参数验证失败：参数主体的类型无效，值： {json data}类型：<class 'list'>，有效类型：<class 'bytes'>，<class 'bytearray'>，类似文件object

Answer 1

这是一个简单的解决方法，我需要将列表转换为字符串，然后将其转换为字节。

import json
import boto3
s3 = boto3.client('s3')
bucket = 'my_bucket_name'
key = 'xx_original_file.json'
response = s3.get_object(Bucket=bucket,Key=key)
content = response['Body']
jsonObject = json.loads(content.read())
result = "\n".join([json.dumps(record) for record in jsonObject])
body = result.encode('utf-8')
new_bucket = 'my_bucket_name'
new_key = 'xx_new_file.json'
s3.put_object(Bucket=new_bucket,Key=new_key,Body=body)

使用 Python 将标准 JSON 文件转换为 json-serde 格式并上传到 Amazon Athena（Presto、Hive）的 AWS S3 存储桶

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-06 16:00:12

使用 Python 将标准 JSON 文件转换为 json-serde 格式并上传到 Amazon Athena（Presto、Hive）的 AWS S3 存储桶

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-06 16:00:12

解决方案1
0 已采纳 2020-08-06 16:00:12