Merging multiple JSON files into single JSON file in S3 from AWS Lambda python function

Question

I am strucked in my work where my requirement is combining multiple json files into single json file and need to compress it in s3 folder

Somehow I did but the json contents are merging in dictionary and I know I have used Dictionary to load my json content from files because I tried with loading as List but it throws mw JSONDecodeError "Extra data:line 1 column 432(431)"

my file looks like below: file1 (no.json extension will be there)

{"abc":"bcd","12354":"31354321"}

file 2

{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"}

my code-

import json
import boto3

s3_client=boto3.client('s3')

bucket_name='<my bucket>'

def lambda_handler(event,context):
 key='<Bucket key>'
 jsonfilesname = ['<name of the json files which stored in list>']
 result=[]
 json_data={}
 for f in (range(len(jsonfilesname))):
  s3_client.download_file(bucket_name,key+jsonfilesname[f],'/tmp/'+key+jsonfilesname[f])
  infile = open('/tmp/'+jsonfilesname[f]).read()
  json_data[infile] = result
 with open('/tmp/merged_file','w') as outfile:
  json.dump(json_data,outfile)

my output for the outfile by the above code is

{
"{"abc":"bcd","12354":"31354321"}: []",
"{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"} :[]"
}

my expectation is:

{"abc":"bcd","12354":"31354321"},{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"}

Please someone help and advice what needs to be done to get as like my expected output

Answer 1

First of all:

file 2 is not a valid JSON file, correctly it should be:

{
    "abc": "bcd",
    "12354": "31354321",
    "hqeddeqf": "5765354"
}

Also, the output is not a valid JSON file, what you would expect after merging 2 JSON files is an array of JSON objects:

[
    {
        "abc": "bcd",
        "12354": "31354321"
    },
    {
        "abc": "bcd",
        "12354": "31354321",
        "hqeddeqf": "5765354"
    }
]

Knowing this, we could write a Lamdda to merge JSONS files:

import json
import boto3

s3 = boto3.client('s3')

def lambda_handler(event,context):
    bucket = '...'
    jsonfilesname = ['file1.json', 'file2.json']
    result=[]
    for key in jsonfilesname:
        data = s3.get_object(Bucket=bucket, Key=key)
        content = json.loads(data['Body'].read().decode("utf-8"))
        result.append(content)

    # Do something with the merged content
    print(json.dumps(result))

Answer 2

If you are using AWS, I would recommend using S3DistCp for json file merging as it provides a fault-tolerant, distributed way that can keep up with large files as well by leveraging MapReduce. However, it does not seem to support in-place merging.

Merging multiple JSON files into single JSON file in S3 from AWS Lambda python function

Question

2 answers

solution1
0 2021-12-22 17:08:04

solution2
0 2022-06-10 03:00:39

Merging multiple JSON files into single JSON file in S3 from AWS Lambda python function

Question

2 answers

solution1 0 2021-12-22 17:08:04

solution2 0 2022-06-10 03:00:39

solution1
0 2021-12-22 17:08:04

solution2
0 2022-06-10 03:00:39