繁体   English   中英

Merging multiple JSON files into single JSON file in S3 from AWS Lambda python function

[英]Merging multiple JSON files into single JSON file in S3 from AWS Lambda python function

我对我的工作感到震惊,我的要求是将多个 json 文件组合成单个 json 文件并需要将其压缩到 s3 文件夹中

不知何故,我做到了,但 json 内容正在字典中合并,我知道我已经使用字典从文件中加载我的 json 内容,因为我尝试加载为列表,但它抛出 mw JSONDecodeError "Extra data:line 1 column 432(431)"

我的文件如下所示:file1(将有 no.json 扩展名)

{"abc":"bcd","12354":"31354321"}

文件 2

{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"}

我的代码-

import json
import boto3

s3_client=boto3.client('s3')

bucket_name='<my bucket>'

def lambda_handler(event,context):
 key='<Bucket key>'
 jsonfilesname = ['<name of the json files which stored in list>']
 result=[]
 json_data={}
 for f in (range(len(jsonfilesname))):
  s3_client.download_file(bucket_name,key+jsonfilesname[f],'/tmp/'+key+jsonfilesname[f])
  infile = open('/tmp/'+jsonfilesname[f]).read()
  json_data[infile] = result
 with open('/tmp/merged_file','w') as outfile:
  json.dump(json_data,outfile)

我上面代码的输出文件的 output 是

{
"{"abc":"bcd","12354":"31354321"}: []",
"{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"} :[]"
}

我的期望是:

{"abc":"bcd","12354":"31354321"},{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"}

请有人帮助和建议需要做什么才能达到我预期的 output

首先:

file 2不是有效的 JSON 文件,正确的应该是:

{
    "abc": "bcd",
    "12354": "31354321",
    "hqeddeqf": "5765354"
}

Also, the output is not a valid JSON file, what you would expect after merging 2 JSON files is an array of JSON objects:

[
    {
        "abc": "bcd",
        "12354": "31354321"
    },
    {
        "abc": "bcd",
        "12354": "31354321",
        "hqeddeqf": "5765354"
    }
]

知道了这一点,我们可以编写一个 Lamdda 来合并 JSONS 文件:

import json
import boto3

s3 = boto3.client('s3')

def lambda_handler(event,context):
    bucket = '...'
    jsonfilesname = ['file1.json', 'file2.json']
    result=[]
    for key in jsonfilesname:
        data = s3.get_object(Bucket=bucket, Key=key)
        content = json.loads(data['Body'].read().decode("utf-8"))
        result.append(content)

    # Do something with the merged content
    print(json.dumps(result))

如果您使用 AWS,我建议您使用S3DistCp进行 json 文件合并,因为它提供了一种容错分布式方式,可以通过利用 MapReduce 跟上大文件。 但是,它似乎不支持in-place合并。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM