简体   繁体   中英

Convert standard JSON file to json-serde format using Python & upload to AWS S3 bucket for Amazon Athena (Presto, Hive)

I am trying to convert a json file to json-serde format, uploading the json-serde file to AWS S3 bucket using Python so that Amazon Athena (Presto/Hive) can read the file in the s3 bucket.

Per AWS product documentation, a typical json file is not a valid format; the json file needs to be in json-serde format: https://docs.aws.amazon.com/athena/latest/ug/json-serde.html

Locally, I am able to convert a json file to json-serde format using the below code:

import json
with open('xx_original_file.json','r',encoding='utf-8') as json_file:
    data = json.load(json_file)
result = [json.dumps(record) for record in data]
with open('xx_new_file.json', 'w') as obj:
    for i in result:
        obj.write(i+'\n')

Is there an equivalent way to do this in Python that will allow me to store a new json-serde file in an s3 bucket? I keep getting an error with the Python script I have built so far:

import json
import os
import boto3

s3 = boto3.client('s3')
bucket = 'my_bucket_name'
key = 'xx_original_file.json'
response = s3.get_object(Bucket=bucket,Key=key)
content = response['Body']
jsonObject = json.loads(content.read())
result = [json.dumps(record) for record in jsonObject]
new_results = []
for i in result:
    new_results.append(i+'\n')
new_key = 'xx_new_file.json'
s3.put_object(Bucket=bucket,Key=new_key,Body=new_results)

Error Message: ParamValidationError: Parameter validation failed: Invalid type for parameter Body, value: {json data} type: <class 'list'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object

This was an easy fix, I needed to convert the list into a string and then convert it to bytes.

import json
import boto3
s3 = boto3.client('s3')
bucket = 'my_bucket_name'
key = 'xx_original_file.json'
response = s3.get_object(Bucket=bucket,Key=key)
content = response['Body']
jsonObject = json.loads(content.read())
result = "\n".join([json.dumps(record) for record in jsonObject])
body = result.encode('utf-8')
new_bucket = 'my_bucket_name'
new_key = 'xx_new_file.json'
s3.put_object(Bucket=new_bucket,Key=new_key,Body=body)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM