简体   繁体   English

从一个 s3 存储桶中获取 zip 个文件,将它们解压缩到另一个 s3 存储桶中

[英]get zip files from one s3 bucket unzip them to another s3 bucket

I have zip files in one s3 bucket I need to unzip them and copy the unzipped folder to another s3 bucket and keep the source path我在一个 s3 存储桶中有 zip 个文件我需要解压缩它们并将解压缩的文件夹复制到另一个 s3 存储桶并保留源路径

for example - if in source bucket the zip file in under例如 - 如果在源存储桶中,zip 文件位于

"s3://bucketname/foo/bar/file.zip"

then in destination bucket it should be "s3://destbucketname/foo/bar/zipname/files.."然后在目标存储桶中它应该是"s3://destbucketname/foo/bar/zipname/files.."

how can it be done?如何做呢? i know that it is possible somehow to do it with lambda so i wont have to download it locally but i have no idea how我知道可以用 lambda 以某种方式做到这一点,所以我不必在本地下载它,但我不知道如何

thanks !谢谢 !

If your desire is to trigger the above process as soon as the Zip file is uploaded into the bucket, then you could write an AWS Lambda function 如果您希望在将Zip文件上传到存储桶后立即触发上述过程,则可以编写一个AWS Lambda函数

When the Lambda function is triggered, it will be passed the name of the bucket and object that was uploaded. 触发Lambda函数时,将传递其上载的存储桶和对象的名称。 The function should then: 然后,该函数应:

  • Download the Zip file to /tmp 将Zip文件下载/tmp
  • Unzip the file (Beware: maximum storage available: 500MB) 解压缩文件(请注意:最大可用存储空间:500MB)
  • Loop through the unzipped files and upload them to the destination bucket 循环浏览解压缩的文件并将其上传到目标存储桶
  • Delete all local files created (to free-up space for any future executions of the function) 删除所有创建的本地文件(以释放空间以供将来执行该函数使用)

For a general example, see: Tutorial: Using AWS Lambda with Amazon S3 - AWS Lambda 有关一般示例,请参阅: 教程:将AWS Lambda与Amazon S3结合使用-AWS Lambda

You can use AWS Lambda for this.您可以为此使用 AWS Lambda。 You can also set an event notification in your S3 bucket so that a lambda function is triggered everytime a new file arrives.您还可以在 S3 存储桶中设置事件通知,以便在每次新文件到达时触发 lambda function。 You can write a Python code that uses boto3 to connect to S3.您可以编写使用 boto3 连接到 S3 的 Python 代码。 Then you can read files into a buffer, and unzip them using these libraries, gzip them and then reupload to S3 in your desired folder/path:然后你可以将文件读入缓冲区,并使用这些库解压缩它们,gzip 它们然后重新上传到你想要的文件夹/路径中的 S3:

import gzip
import zipfile
import io

with zipped.open(file, "r") as f_in:
     gzipped_content = gzip.compress(f_in.read())
     destinationbucket.upload_fileobj(io.BytesIO(gzipped_content),
                                                        final_file_path,
                                                        ExtraArgs={"ContentType": "text/plain"}
                                                )

There is also a tutorial here: https://betterprogramming.pub/unzip-and-gzip-incoming-s3-files-with-aws-lambda-f7bccf0099c9这里还有教程: https://betterprogramming.pub/unzip-and-gzip-incoming-s3-files-with-aws-lambda-f7bccf0099c9

Arguably Python is simpler to use for your Lambda, but if you are considering Java, I've made a library that manages unzipping of data in AWS S3 utilising stream download and multipart upload.可以说 Python 对于您的 Lambda 使用起来更简单,但如果您正在考虑 Java,我已经创建了一个库,利用 stream 下载和分段上传来管理 AWS S3 中数据的解压缩。

Unzipping is achieved without keeping data in memory or writing to disk.解压缩是在不将数据保留在 memory 或写入磁盘的情况下实现的。 That makes it suitable for large data files - it has been used to unzip files of size 100GB+.这使得它适用于大型数据文件——它已被用于解压缩 100GB 以上的文件。

It is available in Maven Central , here is the GitHub link: nejckorasa/s3-stream-unzip它在Maven Central可用,这里是 GitHub 链接: nejckorasa/s3-stream-unzip

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 json 文件从一个 s3 存储桶复制到另一个 s3 存储桶时,无法识别 Json 文件? - Json file is not recognising when copy json files from one s3 bucket to another s3 bucket? 从 s3 存储桶中获取唯一文件 - Get unique files from s3 bucket 将文件从一个帐户中的 AWS S3 存储桶复制到 terraform/python 中另一个帐户中的存储桶 - copy files from AWS S3 bucket in one account to bucket in another account in terraform/python 将 zip 个文件直接从网站上传到 AWS S3 存储桶? - Upload zip files directly to AWS S3 bucket from website? 是否可以在不使用存储桶策略的情况下将 s3 存储桶内容从一个存储桶复制到另一个帐户 s3 存储桶? - is it possible to copy s3 bucket content from one bucket to another account s3 bucket without using bucket policy? 将文件复制并合并到另一个 S3 存储桶 - Copy and Merge files to another S3 bucket 从 S3 存储桶中的文件夹中删除文件 - Delete files from folder in S3 bucket 来自 s3 存储桶的 Stream 个视频文件 - Stream video files from s3 bucket 从一个存储桶到另一个存储桶的 AWS S3 同步命令 - AWS S3 sync command from one bucket to another 从 s3 存储桶中获取具有特定子字符串的文件列表 - Get list of files from s3 bucket with a particular substring
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM