使用 Lambda 從 S3 上的 CSV 文件在 S3 上創建 zip 文件

Question

每天在我的 S3 存儲桶中生成大約 60 個 CSV 文件。 每個文件的平均大小約為 500MB。 I want to zip all these files through lambda function on the fly(without downloading a file inside Lambda execution) and upload these zipped files to another s3 bucket. 我遇到了這些解決方案1和2 ，但我在實施中仍然遇到問題。 Right now, I am trying to stream CSV file data into a zipped file(this zip file is being created in Lambda tmp directory) and then uploading on s3. 但是我在寫入 zip 文件時收到此錯誤消息： [Errno 36] File name too long

這是我的測試 Lambda function 我只是在嘗試一個文件，但在實際情況下我需要 zip 50-604DEAFDD68C550DE74

import boto3
import zipfile


def lambda_handler(event, context):
    s3 = boto3.resource('s3')
    iterator = s3.Object('bucket-name', 'file-name').get()['Body'].iter_lines()
    my_zip = zipfile.ZipFile('/tmp/test.zip', 'w')
    for line in iterator:
        my_zip.write(line)
    
    s3_resource.meta.client.upload_fileobj(file-name, "another-bucket-name", "object-name")

Also, is there a way where I can stream data from my CSV file, zip it and upload it to another s3 bucket without actually saving a full zip file on Lambda memory?

Answer 1

經過大量的研究和試驗，我能夠使它工作。 我使用smart_open庫來解決我的問題，並在我的 Lambda 中使用了 150MB memory 來管理 zip 550MB 文件。 要使用外部庫，我必須在 Lambda 中使用圖層。 這是我的代碼：

from smart_open import open, register_compressor
import lzma, os


def lambda_handler(event, context):
    with open('s3://bucket-name-where-large-file/file-key-name') as fin:
        with open('s3://bucket-name-to-put-zip-file/zip-file-key-name', 'w') as fout:
            for line in fin:
                fout.write(line)

請注意，smart_open 支持.gz和.bz2文件壓縮。 如果您想將 zip 文件轉換為其他格式，您可以使用該庫的register_compressor方法創建自己的壓縮器。

使用 Lambda 從 S3 上的 CSV 文件在 S3 上創建 zip 文件

問題描述

1 個解決方案

解決方案1
1 已采納 2020-12-30 05:49:01

使用 Lambda 從 S3 上的 CSV 文件在 S3 上創建 zip 文件

問題描述

1 個解決方案

解決方案1 1 已采納 2020-12-30 05:49:01

解決方案1
1 已采納 2020-12-30 05:49:01