簡體   English   中英

Merging multiple JSON files into single JSON file in S3 from AWS Lambda python function

[英]Merging multiple JSON files into single JSON file in S3 from AWS Lambda python function

我對我的工作感到震驚,我的要求是將多個 json 文件組合成單個 json 文件並需要將其壓縮到 s3 文件夾中

不知何故,我做到了,但 json 內容正在字典中合並,我知道我已經使用字典從文件中加載我的 json 內容,因為我嘗試加載為列表,但它拋出 mw JSONDecodeError "Extra data:line 1 column 432(431)"

我的文件如下所示:file1(將有 no.json 擴展名)

{"abc":"bcd","12354":"31354321"}

文件 2

{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"}

我的代碼-

import json
import boto3

s3_client=boto3.client('s3')

bucket_name='<my bucket>'

def lambda_handler(event,context):
 key='<Bucket key>'
 jsonfilesname = ['<name of the json files which stored in list>']
 result=[]
 json_data={}
 for f in (range(len(jsonfilesname))):
  s3_client.download_file(bucket_name,key+jsonfilesname[f],'/tmp/'+key+jsonfilesname[f])
  infile = open('/tmp/'+jsonfilesname[f]).read()
  json_data[infile] = result
 with open('/tmp/merged_file','w') as outfile:
  json.dump(json_data,outfile)

我上面代碼的輸出文件的 output 是

{
"{"abc":"bcd","12354":"31354321"}: []",
"{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"} :[]"
}

我的期望是:

{"abc":"bcd","12354":"31354321"},{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"}

請有人幫助和建議需要做什么才能達到我預期的 output

首先:

file 2不是有效的 JSON 文件,正確的應該是:

{
    "abc": "bcd",
    "12354": "31354321",
    "hqeddeqf": "5765354"
}

Also, the output is not a valid JSON file, what you would expect after merging 2 JSON files is an array of JSON objects:

[
    {
        "abc": "bcd",
        "12354": "31354321"
    },
    {
        "abc": "bcd",
        "12354": "31354321",
        "hqeddeqf": "5765354"
    }
]

知道了這一點,我們可以編寫一個 Lamdda 來合並 JSONS 文件:

import json
import boto3

s3 = boto3.client('s3')

def lambda_handler(event,context):
    bucket = '...'
    jsonfilesname = ['file1.json', 'file2.json']
    result=[]
    for key in jsonfilesname:
        data = s3.get_object(Bucket=bucket, Key=key)
        content = json.loads(data['Body'].read().decode("utf-8"))
        result.append(content)

    # Do something with the merged content
    print(json.dumps(result))

如果您使用 AWS,我建議您使用S3DistCp進行 json 文件合並,因為它提供了一種容錯分布式方式,可以通過利用 MapReduce 跟上大文件。 但是,它似乎不支持in-place合並。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM