简体   繁体   English

从 S3 解压缩文件,写入 CSV 文件并推回到 S3

[英]Unzip file from S3, write to a CSV file and push back to S3

I have built a lambda that collect logs from a EC2 instance and uploads them to a S3 buckets on a daily basis.我构建了一个 lambda,它每天从 EC2 实例收集日志并将它们上传到 S3 存储桶。 The logs are stored as .gz files, and now I want to build another lambda that collects the most recently uploaded log file, unzips it, writes it to a CSV file and then pushes it back up to the s3.日志存储为 .gz 文件,现在我想构建另一个 lambda 来收集最近上传的日志文件,将其解压缩,将其写入 CSV 文件,然后将其推回 s3。

I've managed to collect a log file, unzip it and push it back up but I would like some directions how to target the most recent file in the s3 bucket, and how to write it to a CSV before pushing it back up.我设法收集了一个日志文件,将其解压缩并将其推回,但我想要一些说明如何定位 s3 存储桶中的最新文件,以及如何在推回之前将其写入 CSV。

I'm using Python for my lambda, and this is how my code looks like right now:我正在为我的 lambda 使用 Python,这就是我的代码现在的样子:

def lambda_handler(event, context):
s3 = boto3.client('s3', use_ssl = False)

s3.upload_fileobj(
    Fileobj = gzip.GzipFile(
        None,
        'rb',
        fileobj = BytesIO(
            s3.get_object(Bucket='bucketName', Key='key')['Body'].read())),
            Bucket ='bucketName',
            Key ='key')

You don't need to worry about querying the latest object in S3.您无需担心在 S3 中查询最新对象。 Just use a S3 Event that triggers your Lambda function.只需使用触发您的 Lambda 函数的S3 事件即可

This means that whenever you Lambda in invoked, it will be invoked with the last inserted object on S3, therefore the most recent.这意味着每当您调用 Lambda 时,它将使用 S3 上最后插入的对象进行调用,因此是最新的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM