简体   繁体   中英

How do I transfer files from s3 to my ec2 instance whenever I add a new file to s3?

I have a py script which is in my ec2 instance. That requires a video file as input which is in an S3 bucket. How do I automate the process where the ec2 instance starts running every time a new file is added to that bucket? I want the ec2 instance to recognize this new file and then add it to its local directory where the py script can use it and process it and create the output file. I want to then send this output file back to the bucket and store it there. I know boto3 library is used to connect s3 to ec2, however I am unclear how to trigger this automatically and look for new files without having to manually start my instance and copy everything

Edit: I have a python program which basically takes a video file(mp4) and then breaks it into frames and stitches it to create a bunch of small panorama images and stores it in a folder named 'Output'. Now as the program needs a video as input, in the program I refer to a particular directory where it is supposed pick the mp4 file from and read it as input. So what I now want is that, there is going to be an s3 bucket that is going to receive a video file from elsewhere. it is going to be inside a folder inside a particular bucket. I want any new mp4 file entering that bucket to be copied or sent to the input directory in my instance. Also, when this happens, I want my python program stored in that instance to be automatically executed and find this new video file in the input directory to process it and make the small panoramas and then store it in the output directory or even better, send it to an output folder in the same s3 bucket.

There are many ways in which you could design a solution for that. They will vary depending on how often you get your videos, should it be scalable, fault tolerant, how many videos do you want to process in parallel and more. I will just provide one, on the assumption that the new videos are uploaded occasionally and no auto-scaling groups are needed for processing large number of videos at the same time.

On the above assumption, one way could be as follows:

  1. Upload of a new video triggers a lambda function using S3 event notifications .
  2. Lambda gets the video details (eg s3 path) from the S3 event, submits the video details to a SQS queue and starts your instance .
  3. Your application on the instance, once started, pulls the SQS queue for details of the video file to process. This would require your application to be designed in a way that its starts a instance start, which can be done using modified user data , systemd unit files and more.

Its a very basic solution , and as I mentioned many other ways are possible, involving auto-scaling group, scaling policies based on sqs size, ssm run commands, and more.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM