[英]get zip files from one s3 bucket unzip them to another s3 bucket
I have zip files in one s3 bucket I need to unzip them and copy the unzipped folder to another s3 bucket and keep the source path我在一个 s3 存储桶中有 zip 个文件我需要解压缩它们并将解压缩的文件夹复制到另一个 s3 存储桶并保留源路径
for example - if in source bucket the zip file in under例如 - 如果在源存储桶中,zip 文件位于
"s3://bucketname/foo/bar/file.zip"
then in destination bucket it should be "s3://destbucketname/foo/bar/zipname/files.."
然后在目标存储桶中它应该是"s3://destbucketname/foo/bar/zipname/files.."
how can it be done?如何做呢? i know that it is possible somehow to do it with lambda so i wont have to download it locally but i have no idea how我知道可以用 lambda 以某种方式做到这一点,所以我不必在本地下载它,但我不知道如何
thanks !谢谢 !
If your desire is to trigger the above process as soon as the Zip file is uploaded into the bucket, then you could write an AWS Lambda function 如果您希望在将Zip文件上传到存储桶后立即触发上述过程,则可以编写一个AWS Lambda函数
When the Lambda function is triggered, it will be passed the name of the bucket and object that was uploaded. 触发Lambda函数时,将传递其上载的存储桶和对象的名称。 The function should then: 然后,该函数应:
/tmp
将Zip文件下载到/tmp
For a general example, see: Tutorial: Using AWS Lambda with Amazon S3 - AWS Lambda 有关一般示例,请参阅: 教程:将AWS Lambda与Amazon S3结合使用-AWS Lambda
You can use AWS Lambda for this.您可以为此使用 AWS Lambda。 You can also set an event notification in your S3 bucket so that a lambda function is triggered everytime a new file arrives.您还可以在 S3 存储桶中设置事件通知,以便在每次新文件到达时触发 lambda function。 You can write a Python code that uses boto3 to connect to S3.您可以编写使用 boto3 连接到 S3 的 Python 代码。 Then you can read files into a buffer, and unzip them using these libraries, gzip them and then reupload to S3 in your desired folder/path:然后你可以将文件读入缓冲区,并使用这些库解压缩它们,gzip 它们然后重新上传到你想要的文件夹/路径中的 S3:
import gzip
import zipfile
import io
with zipped.open(file, "r") as f_in:
gzipped_content = gzip.compress(f_in.read())
destinationbucket.upload_fileobj(io.BytesIO(gzipped_content),
final_file_path,
ExtraArgs={"ContentType": "text/plain"}
)
There is also a tutorial here: https://betterprogramming.pub/unzip-and-gzip-incoming-s3-files-with-aws-lambda-f7bccf0099c9这里还有教程: https://betterprogramming.pub/unzip-and-gzip-incoming-s3-files-with-aws-lambda-f7bccf0099c9
Arguably Python is simpler to use for your Lambda, but if you are considering Java, I've made a library that manages unzipping of data in AWS S3 utilising stream download and multipart upload.可以说 Python 对于您的 Lambda 使用起来更简单,但如果您正在考虑 Java,我已经创建了一个库,利用 stream 下载和分段上传来管理 AWS S3 中数据的解压缩。
Unzipping is achieved without keeping data in memory or writing to disk.解压缩是在不将数据保留在 memory 或写入磁盘的情况下实现的。 That makes it suitable for large data files - it has been used to unzip files of size 100GB+.这使得它适用于大型数据文件——它已被用于解压缩 100GB 以上的文件。
It is available in Maven Central , here is the GitHub link: nejckorasa/s3-stream-unzip它在Maven Central可用,这里是 GitHub 链接: nejckorasa/s3-stream-unzip
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.