简体   繁体   English

如何在 Amazon s3 Bucket 中压缩文件并获取其 URL

[英]How to zip files in Amazon s3 Bucket and get its URL

我在 Amazon s3 存储桶中有一堆文件,我想压缩这些文件并使用 Java Spring 通过 S3 URL 下载内容。

S3 is not a file server, nor does it offer operating system file services, such as data manipulation. S3 不是文件服务器,也不提供操作系统文件服务,例如数据操作。

If there is many "HUGE" files, your best bet is如果有很多“巨大”的文件,你最好的选择是

  1. start a simple EC2 instance启动一个简单的 EC2 实例
  2. Download all those files to EC2 instance, compress them, reupload it back to S3 bucket with a new object name将所有这些文件下载到 EC2 实例,压缩它们,使用新的对象名称将其重新上传回 S3 存储桶

Yes, you can use AWS lambda to do the same thing, but lambda is bounds to 900 seconds (15 mins) execution timeout (Thus it is recommended to allocate more RAM to boost lambda execution performance)是的,您可以使用 AWS lambda 来做同样的事情,但 lambda 被限制为 900 秒(15 分钟)执行超时(因此建议分配更多 RAM 以提高 lambda 执行性能)

Traffics from S3 to local region EC2 instance and etc services is FREE.从 S3 到本地区域 EC2 实例等服务的流量是免费的。

If your main purpose is just to read those file within same AWS region using EC2/etc services, then you don't need this extra step.如果您的主要目的只是使用 EC2/etc 服务读取同一 AWS 区域内的那些文件,那么您不需要这个额外的步骤。 Just access the file directly.直接访问文件即可。

(Update) : As mentioned by @Robert Reiz, now you can also use AWS Fargate to do the job. (更新):正如@Robert Reiz 所提到的,现在您也可以使用 AWS Fargate 来完成这项工作。

Note :笔记 :

It is recommended to access and share file using AWS API.建议使用 AWS API 访问和共享文件。 If you intend to share the file publicly, you must look into security issue seriously and impose download restriction.如果您打算公开共享该文件,则必须认真研究安全问题并施加下载限制。 AWS traffics out to internet is never cheap. AWS 到互联网的流量从不便宜。

Zip them in your end instead of doing it in AWS, ideally in frontend, directly on user browser.将它们压缩到您的末端,而不是在 AWS 中进行,最好是在前端,直接在用户浏览器上进行。 You can stream the download of several files in javascript, use that stream to create a zip and save this zip on user disk.您可以在 javascript 中流式下载多个文件,使用该流创建一个 zip 并将此 zip 保存在用户磁盘上。

The advantages of moving the zipping to the frontend:将 zipping 移到前端的优点:

  • You can use it with S3 URLs, a bunch of presigned links or even mixing content from different sources, some from S3, some of whatever other place.您可以将它与 S3 URL、一堆预签名链接一起使用,甚至可以混合来自不同来源的内容,一些来自 S3,一些来自其他任何地方。
  • You don't waste lambda memory, nor have to up an EC2 fargate instance, that saves money.您不会浪费 lambda 内存,也不必启动 EC2 Fargate 实例,这样可以节省资金。 Let the user computer do it for you.让用户计算机为您完成。
  • Improves user experience - no needs to wait the zip is created to start downloading it, just start downloading meanwhile the zip is being created.改善用户体验 - 无需等待创建 zip 即可开始下载,只需在创建 zip 的同时开始下载。

StreamSaver is useful for this purpose, but in their zipping examples ( Saving multiple files as a zip ) is limited by less than 4GB files as it doesn't implement zip64. StreamSaver对此很有用,但在他们的压缩示例( 将多个文件另存为 zip )中,由于它没有实现 zip64,因此文件数量少于 4GB。

In case you choose this option, keep in mind that if you have CORS enabled in your bucket you will need to add the frontend url where the zipping is done, right in the AllowedOrigins field from your CORS configuration of your bucket.如果您选择此选项,请记住,如果您在存储桶中启用了 CORS,则需要在存储桶的 CORS 配置的AllowedOrigins字段中添加完成压缩的前端 url。

If you need individual files (objects) in S3 compressed, then it is possible to do so in a round-about way.如果您需要压缩 S3 中的单个文件(对象),则可以采用一种迂回的方式进行压缩。 You can define a CloudFront endpoint pointing to the S3 bucket, then let CloudFront compress the content on the way out: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html您可以定义一个指向 S3 存储桶的 CloudFront 端点,然后让 CloudFront 在输出时压缩内容: https : //docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html

Hi I recently have to do that for my application -- serve a bundle of files in zip format through a url link that the users can download.嗨,我最近必须为我的应用程序执行此操作 - 通过用户可以下载的 url 链接提供一组 zip 格式的文件。

In a nutshell, first create an object using BytesIO method, then use the ZipFile method to write into this object by iterating all the s3 objects, then use put method on this zip object and create a presiged url for it.简而言之,首先使用 BytesIO 方法创建一个对象,然后使用 ZipFile 方法通过迭代所有 s3 对象来写入这个对象,然后在这个 zip 对象上使用 put 方法并为其创建一个预设的 url。

The code I used looks like this:我使用的代码如下所示:

First, call this function to get the zip object, ObjectKeys are the s3 objects that you need to put into the zip file.首先调用这个函数获取zip对象,ObjectKeys就是你需要放入zip文件的s3对象。


def zipResults(bucketName, ObjectKeys):
    buffer = BytesIO()
    with zipfile.ZipFile(buffer, 'w', compression=zipfile.ZIP_DEFLATED) as zip_file:
        for ObjectKey in ObjectKeys:
            objectContent = S3Helper().readFromS3(bucketName, ObjectKey)
            fileName = os.path.basename(ObjectKey)
            zip_file.writestr(fileName, objectContent)

    buffer.seek(0)
    return buffer

Then call this function, key is the key you give to your zip object:然后调用这个函数,key 是你给你的 zip 对象的键:

def uploadObject(bucketName, body, key):
    s3client = AwsHelper().getClient("s3")
    try:
        response = s3client.put_object(
            Bucket=bucketName,
            Body=body,
            Key=key
        )
    except ClientError as e:
        logging.error(e)
        return None

    return response

Of course, you would need io, zipfile and boto3 modules.当然,您需要 io、zipfile 和 boto3 模块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM