简体   繁体   English

压缩100个存储在S3上的文件

[英]Zipping 100's of files stored on S3

We use S3 to store various media uploaded via our application such as images, documents etc. We work in the property software industry and as a means of exchanging data stored in our system with property portals a common exchange format between portals is the Rightmove BLM data feed specification. 我们使用S3来存储通过我们的应用程序上传的各种媒体,例如图像,文档等。我们从事物业软件行业的工作,并且作为与物业门户交换系统中存储的数据的一种方式,门户之间的常见交换格式是Rightmove BLM数据饲料规格。 This is essentially a zip file containing a delimited text file and any associated media which is sent via FTP to each portal. 这实际上是一个zip文件,其中包含定界文本文件和任何相关媒体,这些文件通过FTP发送到每个门户。 However a bottleneck in the process is downloading the media from S3 for zipping. 但是,此过程中的瓶颈是从S3下载媒体进行压缩。 For example one single account on our system could have in the region of 1000 images/documents to the downloaded and zipped in preparation for transfer (each file has to be name in a particular format for that particular portal (unique number, sequence numbers etc). However downloading 1000 images/documents from S3 to an EC2 server in the same region via the PHP SDK takes some time (60+ seconds). When doing this for multiple accounts at the same time it puts considerable load on the server. 例如,我们系统上的一个帐户可以下载并压缩1000张图像/文档,以准备传输(每个文件必须是该特定门户网站的特定格式名称(唯一编号,序列号等) 。但是,通过PHP SDK从S3将1000张图像/文档下载到同一区域中的EC2服务器需要花费一些时间(60+秒),当同时对多个帐户执行此操作会给服务器带来可观的负载。

Is there an better/faster way to download files from S3 so they can be prepped and zipped on the EC2 instance? 是否有更好/更快的方式从S3下载文件,以便可以在EC2实例上准备和压缩文件?

Thanks. 谢谢。

One option would be to aggregate the zip as the files are added. 一种选择是在添加文件时聚合zip。 Meaning, instead of zipping the files all at once, use a Lambda function to add them to a zip file as they're added to or updated on the S3 bucket. 意思是,而不是一次压缩所有文件,而是使用Lambda函数将其添加到zip文件中,因为它们是在S3存储桶中添加或更新的。 Then, the zip would be available more-or-less on demand. 然后,将按需提供或多或少的拉链。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM