简体繁体 English

每当我向 s3 添加新文件时，如何将文件从 s3 传输到我的 ec2 实例？

[英]How do I transfer files from s3 to my ec2 instance whenever I add a new file to s3?

原文 2021-02-17 04:13:54 1 1 python/ amazon-web-services/ amazon-s3/ amazon-ec2

I have a py script which is in my ec2 instance.我的 ec2 实例中有一个 py 脚本。 That requires a video file as input which is in an S3 bucket.这需要一个视频文件作为输入，该文件位于 S3 存储桶中。 How do I automate the process where the ec2 instance starts running every time a new file is added to that bucket?每次将新文件添加到该存储桶时，如何自动化 ec2 实例开始运行的过程？ I want the ec2 instance to recognize this new file and then add it to its local directory where the py script can use it and process it and create the output file.我希望 ec2 实例识别这个新文件，然后将其添加到 py 脚本可以使用它并处理它并创建 output 文件的本地目录。 I want to then send this output file back to the bucket and store it there.然后我想将此 output 文件发送回存储桶并将其存储在那里。 I know boto3 library is used to connect s3 to ec2, however I am unclear how to trigger this automatically and look for new files without having to manually start my instance and copy everything我知道 boto3 库用于将 s3 连接到 ec2，但是我不清楚如何自动触发它并查找新文件，而无需手动启动我的实例并复制所有内容

Edit: I have a python program which basically takes a video file(mp4) and then breaks it into frames and stitches it to create a bunch of small panorama images and stores it in a folder named 'Output'.编辑：我有一个 python 程序，它基本上获取一个视频文件（mp4），然后将其分解为帧并将其缝合以创建一堆小的全景图像并将其存储在名为“输出”的文件夹中。 Now as the program needs a video as input, in the program I refer to a particular directory where it is supposed pick the mp4 file from and read it as input.现在，由于程序需要视频作为输入，因此在程序中我引用了一个特定的目录，它应该从中选择 mp4 文件并将其作为输入读取。 So what I now want is that, there is going to be an s3 bucket that is going to receive a video file from elsewhere.所以我现在想要的是，将有一个 s3 存储桶来接收来自其他地方的视频文件。 it is going to be inside a folder inside a particular bucket.它将位于特定存储桶内的文件夹中。 I want any new mp4 file entering that bucket to be copied or sent to the input directory in my instance.我希望将进入该存储桶的任何新 mp4 文件复制或发送到我的实例中的输入目录。 Also, when this happens, I want my python program stored in that instance to be automatically executed and find this new video file in the input directory to process it and make the small panoramas and then store it in the output directory or even better, send it to an output folder in the same s3 bucket.另外，当发生这种情况时，我希望自动执行存储在该实例中的 python 程序，并在输入目录中找到这个新的视频文件来处理它并制作小全景图，然后将其存储在 output 目录中，甚至更好，发送它到同一个 s3 存储桶中的 output 文件夹。

1 个解决方案

There are many ways in which you could design a solution for that.您可以通过多种方式为此设计解决方案。 They will vary depending on how often you get your videos, should it be scalable, fault tolerant, how many videos do you want to process in parallel and more.它们会根据您获取视频的频率、可扩展性、容错性、您想要并行处理的视频数量等等而有所不同。 I will just provide one, on the assumption that the new videos are uploaded occasionally and no auto-scaling groups are needed for processing large number of videos at the same time.我将只提供一个，假设偶尔会上传新视频并且不需要自动缩放组来同时处理大量视频。

On the above assumption, one way could be as follows:根据上述假设，一种方法可能如下：

Upload of a new video triggers a lambda function using S3 event notifications .使用S3 事件通知上传新视频会触发 lambda function 。
Lambda gets the video details (eg s3 path) from the S3 event, submits the video details to a SQS queue and starts your instance . Lambda 从 S3 事件中获取视频详细信息（例如 s3 路径），将视频详细信息提交到SQS 队列并启动您的实例。
Your application on the instance, once started, pulls the SQS queue for details of the video file to process.您在实例上的应用程序一旦启动，就会拉取 SQS 队列以获取要处理的视频文件的详细信息。 This would require your application to be designed in a way that its starts a instance start, which can be done using modified user data , systemd unit files and more.这将要求您的应用程序以启动实例启动的方式进行设计，这可以使用修改后的用户数据、systemd 单元文件等来完成。

Its a very basic solution , and as I mentioned many other ways are possible, involving auto-scaling group, scaling policies based on sqs size, ssm run commands, and more.它是一个非常基本的解决方案，正如我提到的，许多其他方法都是可能的，包括自动缩放组、基于 sqs 大小的缩放策略、ssm 运行命令等等。