AWS lambda 读取 zip 文件执行验证并在验证通过时解压缩到 s3 存储桶

Question

I have a requirement in which a zip files arrives on s3 bucket, I need to write a lambda using python to read the zip file perform some validation and unzip on another S3 bucket.

Zip 文件包含以下内容：

a.csv b.csv c.csv trigger_file.txt

trigger_file.txt -- contain names of files in zip and record count (example: a.csv:120, b.csv:10, c.csv:50 )

因此，使用 lambda 我需要读取触发器文件检查 zip 文件夹中的文件数量是否等于触发器文件中提到的文件数量，如果将解压缩传递到 s3 存储桶。

下面我准备的代码：

def write_to_s3(config_dict):
    inp_bucket = config_dict["inp_bucket"]
    inp_key = config_dict["inp_key"]
    out_bucket = config_dict["out_bucket"]
    des_key = config_dict["des_key"]
    processed_key = config_dict["processed_key"]

    obj = S3_CLIENT.get_object(Bucket=inp_bucket, Key=inp_key)
    putObjects = []
    with io.BytesIO(obj["Body"].read()) as tf:
        # rewind the file
        tf.seek(0)

    # Read the file as a zipfile perform transformations and process the members
    with zipfile.ZipFile(tf, mode='r') as zipf:
        for file in zipf.infolist():
            fileName = file.filename
            print("file name before while loop :",fileName)
            try:
                found = False
                while not found :
                    if fileName == "Trigger_file.txt" :
                        with zipf.open(fileName , 'r') as thefile:
                            my_list = [i.decode('utf8').split(' ') for i in thefile]
                            my_list = str(my_list)[1:-1]
                            print("my_list :",my_list)
                            print("fileName :",fileName)
                            found = True
                            break
                            thefile.close()
                    else:
                        print("Trigger file not found ,try again")
            except Exception as exp_handler:
                    raise exp_handler

            if 'csv' in fileName :
                try:
                    if fileName in my_list:
                        print("Validation Success , all files in Trigger file  are present procced for extraction")
                    else:
                        print("Validation Failed")
                except Exception as exp_handler:
                    raise exp_handler

    # *****FUNCTION TO UNZIP ********


def lambda_handler(event, context):
    try:
        inp_bucket = event['Records'][0]['s3']['bucket']['name']
        inp_key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
        config_dict = build_conf_obj(os.environ['config_bucket'],os.environ['config_file'], os.environ['param_name'])
        write_to_s3(config_dict)
    except Exception as exp_handler:
        print("ERROR")

一切进展顺利，我面临的唯一问题是验证部分，我认为 while 循环是错误的，因为它正在进入无限循环。

期待：

如果找到，则在 zip 文件夹中搜索 trigger_file.txt，然后打破循环进行验证并将其解压缩到 s3 文件夹。 如果没有找到，请继续搜索直到字典结束。

错误 OUTPUT（超时）：

Response:
{
  "errorMessage": "2020-06-16T20:09:06.168Z 39253b98-db87-4e65-b288-b585d268ac5f Task timed out after 60.06 seconds"
}

Request ID:
"39253b98-db87-4e65-b288-b585d268ac5f"

Function Logs:
 again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,try again
Trigger file not found ,trEND RequestId: 39253b98-db87-4e65-b288-b585d268ac5f
REPORT RequestId: 39253b98-db87-4e65-b288-b585d268ac5f  Duration: 60060.06 ms   Billed Duration: 60000 ms   Memory Size: 3008 MB    Max Memory Used: 83 MB  Init Duration: 389.65 ms    
2020-06-16T20:09:06.168Z 39253

Answer 1

在代码中的以下 while 循环中，如果fileName不是"Trigger_file.txt" ，它会陷入无限循环。

 found = False while not found: if fileName == "Trigger_file.txt": with zipf.open(fileName, 'r') as thefile: my_list = [i.decode('utf8').split(' ') for i in thefile] my_list = str(my_list)[1:-1] print("my_list:",my_list) print("fileName:",fileName) found = True break thefile.close() else: print("Trigger file not found,try again")

我认为您可以用以下代码替换部分write_to_s3代码：

def write_to_s3(config_dict):

    ######################
    #### Do something ####
    ######################    

    # Read the file as a zipfile perform transformations and process the members
    with zipfile.ZipFile(tf, mode='r') as zipf:
        found = False
        for file in zipf.infolist():
            fileName = file.filename
            if fileName == "Trigger_file.txt":
                with zipf.open(fileName, 'r') as thefile:
                    my_list = [i.decode('utf8').split(' ') for i in thefile]
                    my_list = str(my_list)[1:-1]
                    print("my_list :", my_list)
                    print("fileName :", fileName)
                    found = True
                    thefile.close()
                    break

        if found is False:
            print("Trigger file not found ,try again")
            return

        for file in zipf.infolist():
            fileName = file.filename
            if 'csv' in fileName:
                if fileName not in my_list:
                    print("Validation Failed")
                    return

        print("Validation Success , all files in Trigger file  are present procced for extraction")

    # *****FUNCTION TO UNZIP ********

AWS lambda 读取 zip 文件执行验证并在验证通过时解压缩到 s3 存储桶

问题描述

1 个解决方案

解决方案1
1 2020-06-18 08:37:44

AWS lambda 读取 zip 文件执行验证并在验证通过时解压缩到 s3 存储桶

问题描述

1 个解决方案

解决方案1 1 2020-06-18 08:37:44

解决方案1
1 2020-06-18 08:37:44