简体繁体 English

在 AWS S3 中处理来自 EC2 实例的数据

[英]Process data in AWS S3 from EC2 instance

原文 2016-07-10 09:01:28 1 1 amazon-web-services/ amazon-s3/ amazon-ec2

I'm wondering what is the best way of processing huge amounts of images stored in AWS S3 buckets from an Ec2 instance located in the same availability zone.我想知道从位于同一可用区的 Ec2 实例处理存储在 AWS S3 存储桶中的大量图像的最佳方法是什么。

Should I download the images that I need each time I have to process them and then delete when I'm done, and do the same thing every time I need to do some processing?我应该在每次必须处理它们时下载我需要的图像，然后在完成后删除，并且每次需要进行一些处理时都做同样的事情吗？

Or is there a better way, like mounting the S3 bucket into the EC2 instance?或者有没有更好的方法，比如将 S3 存储桶安装到 EC2 实例中？ I have seen tools like Fuse for mounting, but I am not sure if this is the best way of processing the data.我见过像 Fuse 这样的安装工具，但我不确定这是否是处理数据的最佳方式。

1 个解决方案

First of all.首先。 Note that each EC2 instance can be killed, so keep data, and results at reasonable storage - like S3.请注意，每个 EC2 实例都可以被终止，因此请将数据和结果保存在合理的存储中 - 如 S3。

If you fetch whole image into memory, and then processing goes.如果您将整个图像提取到内存中，然后进行处理。 I can't see needs for fetching to disk.我看不到提取到磁盘的需求。 On the other hand if image is quite big - you could fetch each part many times.另一方面，如果图像很大 - 您可以多次获取每个部分。 So there is no easy answer, at least with out more information.所以没有简单的答案，至少在没有更多信息的情况下。

You can look at map reduce solutions.您可以查看 map reduce 解决方案。 How they are dealing with keeping data close to processing unit.他们如何处理保持数据靠近处理单元。 Spark is able to process things in memory. Spark 能够处理内存中的事物。

About mounting resources.关于挂载资源。 There are other options like Elastic File System, or Elastic Block Storage - that can be mounted.还有其他选项，如弹性文件系统或弹性块存储 - 可以挂载。