从一个 s3 存储桶中获取 zip 个文件，将它们解压缩到另一个 s3 存储桶中

Question

I have zip files in one s3 bucket I need to unzip them and copy the unzipped folder to another s3 bucket and keep the source path我在一个 s3 存储桶中有 zip 个文件我需要解压缩它们并将解压缩的文件夹复制到另一个 s3 存储桶并保留源路径

for example - if in source bucket the zip file in under例如 - 如果在源存储桶中，zip 文件位于

"s3://bucketname/foo/bar/file.zip"

then in destination bucket it should be "s3://destbucketname/foo/bar/zipname/files.."然后在目标存储桶中它应该是"s3://destbucketname/foo/bar/zipname/files.."

how can it be done?如何做呢？ i know that it is possible somehow to do it with lambda so i wont have to download it locally but i have no idea how我知道可以用 lambda 以某种方式做到这一点，所以我不必在本地下载它，但我不知道如何

thanks !谢谢！

Answer 1

If your desire is to trigger the above process as soon as the Zip file is uploaded into the bucket, then you could write an AWS Lambda function 如果您希望在将Zip文件上传到存储桶后立即触发上述过程，则可以编写一个AWS Lambda函数

When the Lambda function is triggered, it will be passed the name of the bucket and object that was uploaded. 触发Lambda函数时，将传递其上载的存储桶和对象的名称。 The function should then: 然后，该函数应：

Download the Zip file to /tmp 将Zip文件下载到/tmp
Unzip the file (Beware: maximum storage available: 500MB) 解压缩文件（请注意：最大可用存储空间：500MB）
Loop through the unzipped files and upload them to the destination bucket 循环浏览解压缩的文件并将其上传到目标存储桶
Delete all local files created (to free-up space for any future executions of the function) 删除所有创建的本地文件（以释放空间以供将来执行该函数使用）

For a general example, see: Tutorial: Using AWS Lambda with Amazon S3 - AWS Lambda 有关一般示例，请参阅：教程：将AWS Lambda与Amazon S3结合使用-AWS Lambda

Answer 2

You can use AWS Lambda for this.您可以为此使用 AWS Lambda。 You can also set an event notification in your S3 bucket so that a lambda function is triggered everytime a new file arrives.您还可以在 S3 存储桶中设置事件通知，以便在每次新文件到达时触发 lambda function。 You can write a Python code that uses boto3 to connect to S3.您可以编写使用 boto3 连接到 S3 的 Python 代码。 Then you can read files into a buffer, and unzip them using these libraries, gzip them and then reupload to S3 in your desired folder/path:然后你可以将文件读入缓冲区，并使用这些库解压缩它们，gzip 它们然后重新上传到你想要的文件夹/路径中的 S3：

import gzip
import zipfile
import io

with zipped.open(file, "r") as f_in:
     gzipped_content = gzip.compress(f_in.read())
     destinationbucket.upload_fileobj(io.BytesIO(gzipped_content),
                                                        final_file_path,
                                                        ExtraArgs={"ContentType": "text/plain"}
                                                )

There is also a tutorial here: https://betterprogramming.pub/unzip-and-gzip-incoming-s3-files-with-aws-lambda-f7bccf0099c9这里还有教程： https://betterprogramming.pub/unzip-and-gzip-incoming-s3-files-with-aws-lambda-f7bccf0099c9

Answer 3

Arguably Python is simpler to use for your Lambda, but if you are considering Java, I've made a library that manages unzipping of data in AWS S3 utilising stream download and multipart upload.可以说 Python 对于您的 Lambda 使用起来更简单，但如果您正在考虑 Java，我已经创建了一个库，利用 stream 下载和分段上传来管理 AWS S3 中数据的解压缩。

Unzipping is achieved without keeping data in memory or writing to disk.解压缩是在不将数据保留在 memory 或写入磁盘的情况下实现的。 That makes it suitable for large data files - it has been used to unzip files of size 100GB+.这使得它适用于大型数据文件——它已被用于解压缩 100GB 以上的文件。

It is available in Maven Central , here is the GitHub link: nejckorasa/s3-stream-unzip它在Maven Central可用，这里是 GitHub 链接： nejckorasa/s3-stream-unzip

从一个 s3 存储桶中获取 zip 个文件，将它们解压缩到另一个 s3 存储桶中

问题描述

3 个解决方案

解决方案1
0 2019-02-27 22:29:20

解决方案2
0

解决方案3
0 2022-10-07 12:54:43

从一个 s3 存储桶中获取 zip 个文件，将它们解压缩到另一个 s3 存储桶中

问题描述

3 个解决方案

解决方案1 0 2019-02-27 22:29:20

解决方案2 0

解决方案3 0 2022-10-07 12:54:43

解决方案1
0 2019-02-27 22:29:20

解决方案2
0

解决方案3
0 2022-10-07 12:54:43