简体   繁体   English

使用 Python 将标准 JSON 文件转换为 json-serde 格式并上传到 Amazon Athena(Presto、Hive)的 AWS S3 存储桶

[英]Convert standard JSON file to json-serde format using Python & upload to AWS S3 bucket for Amazon Athena (Presto, Hive)

I am trying to convert a json file to json-serde format, uploading the json-serde file to AWS S3 bucket using Python so that Amazon Athena (Presto/Hive) can read the file in the s3 bucket.我正在尝试将 json 文件转换为 json-serde 格式,使用 Python 将 json-serde 文件上传到 AWS S3 存储桶,以便 Amazon Athena (Presto/Hive) 可以读取 s3 存储桶中的文件。

Per AWS product documentation, a typical json file is not a valid format;根据 AWS 产品文档,典型的 json 文件不是有效格式; the json file needs to be in json-serde format: https://docs.aws.amazon.com/athena/latest/ug/json-serde.html json 文件需要采用 json-serde 格式: https://docs.aws.amazon.com/athena/latest/ug/json-serde.ZFC35FDC70D5FC69D269883A822C7A53

Locally, I am able to convert a json file to json-serde format using the below code:在本地,我可以使用以下代码将 json 文件转换为 json-serde 格式:

import json
with open('xx_original_file.json','r',encoding='utf-8') as json_file:
    data = json.load(json_file)
result = [json.dumps(record) for record in data]
with open('xx_new_file.json', 'w') as obj:
    for i in result:
        obj.write(i+'\n')

Is there an equivalent way to do this in Python that will allow me to store a new json-serde file in an s3 bucket?在 Python 中是否有等效的方法可以让我将新的 json-serde 文件存储在 s3 存储桶中? I keep getting an error with the Python script I have built so far:到目前为止,我构建的 Python 脚本一直出错:

import json
import os
import boto3

s3 = boto3.client('s3')
bucket = 'my_bucket_name'
key = 'xx_original_file.json'
response = s3.get_object(Bucket=bucket,Key=key)
content = response['Body']
jsonObject = json.loads(content.read())
result = [json.dumps(record) for record in jsonObject]
new_results = []
for i in result:
    new_results.append(i+'\n')
new_key = 'xx_new_file.json'
s3.put_object(Bucket=bucket,Key=new_key,Body=new_results)

Error Message: ParamValidationError: Parameter validation failed: Invalid type for parameter Body, value: {json data} type: <class 'list'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object错误消息:ParamValidationError:参数验证失败:参数主体的类型无效,值: {json data}类型:<class 'list'>,有效类型:<class 'bytes'>,<class 'bytearray'>,类似文件object

This was an easy fix, I needed to convert the list into a string and then convert it to bytes.这是一个简单的解决方法,我需要将列表转换为字符串,然后将其转换为字节。

import json
import boto3
s3 = boto3.client('s3')
bucket = 'my_bucket_name'
key = 'xx_original_file.json'
response = s3.get_object(Bucket=bucket,Key=key)
content = response['Body']
jsonObject = json.loads(content.read())
result = "\n".join([json.dumps(record) for record in jsonObject])
body = result.encode('utf-8')
new_bucket = 'my_bucket_name'
new_key = 'xx_new_file.json'
s3.put_object(Bucket=new_bucket,Key=new_key,Body=body)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Amazon Athena 从 S3 读取多个 json 文件 - Reading multiple json file from S3 using Amazon Athena 在Python中将json文件上传到s3 Bucket中的特定文件夹 - Upload json File to Specific Folder in s3 Bucket in Python 如何使用Python将json文件复制到Amazon S3 - How to copy json file to Amazon S3 using Python 如何使用 Glue 作业将 JSON 从 s3 转换为 CSV 文件并将其保存在同一个 s3 存储桶中 - How to convert JSON to CSV file from s3 and save it in same s3 bucket using Glue job 使用Amazon Athena从S3存储桶读取单个csv文件并对其进行查询 - Reading a single csv file from s3 bucket using amazon athena and querying it 如何使用 Python 将 Amazon Ion 文件转换为 JSON 格式? - How to convert Amazon Ion file to JSON format using Python? 使用 AWS Lambda 将存储在 S3 存储桶中的 JSON 文件转换为 CSV - Using AWS Lambda to convert JSON files stored in S3 Bucket to CSV 如何通过python将我的结果作为json文件保存/上传到S3存储桶的子文件夹中 - how to save/upload my result as json file into the sub folders of S3 bucket via python 使用 Python3 从 AWS S3 存储桶上传和下载特定版本的文件 - Upload and download file wrt specific version from AWS S3 bucket using Python3 如何使用python boto3将文件上传到aws S3存储桶中的文件夹 - How to upload file to folder in aws S3 bucket using python boto3
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM