简体   繁体   中英

How do I copy a file from s3 bucket to ec2 instance using lambda function?

I am new at aws so this might be basic. I have a bucket where I will be uploading mp4 files at random times. When this happens, I ultimately want this to be copied to a particular directory in an ec2 instance which I have already created. And once this is copied, I want to execute a particular python script(panorama.py) which is in the instance already. That program needs the video as input which is why I want to copy the file from the bucket in the first place. And that program prcesses the video stored in the particular directory and generates its output (a few image files). How do I go about this?

This is what I have done so far:

  1. Created an S3 trigger notification for addition of new objects
  2. Created a lambda function which gets triggered when the bucket receives new file.
  3. Copied the bucket name and file path from the bucket into two variables in the lambda function.
  4. Added code in lambda function to start my instance.
  5. Created a shell script in the instance that runs the py file.
  6. Modified the user data file to run this shell script.

What I want to know is how do I copy that particular file to the local directory before my python file is executed. Can I copy the file to ec2 instance from the lambda function itself using some ssh command or something? Or should I write come command in the user data before the python program is executed? If so, how do i pass the name of bucket and file path to the user data? I read something about SQS in another forum but I do not know how exactly I can achieve this. Can I copy my file before the instance is started itself? Lastly, once the processing of the python program is done, I want to send the output files back the bucket in some folder adn then stop the instance.

ALSO, there is no requirement for the instance to start only when the object is added to bucket. I dont mind keeping the instance running continuously as well. However, this would mean I cant use the 'userdata' right? So i thought its not a good solution. If there is a way to do that as well, I am okay with that

This is my lambda function code so far

import boto3
import uuid
from urllib.parse import unquote_plus
import xml.etree.ElementTree as ET

region = '********'
instances = ['*******']
ec2 = boto3.client('ec2', region_name= region)

def lambda_handler(event, context):
    print(f"Received raw event: {event}")
    
    # Bucket Name where file was uploaded
    source_bucket_name = event['Records'][0]['s3']['bucket']['name']

    # Filename of object (with path)
    file_key_name = event['Records'][0]['s3']['object']['key']
    ec2.start_instances(InstanceIds=instances)
    
    
    print('started the instance: ' + str(instances))

I'd recommend executing code from your ec2 to read from your S3 bucket rather than try to finagle a lambda to do SSH/SCP.

The flow could look like:

  • Object hits S3
  • S3 bucket event triggers Lambda to start EC2
  • Lambda also writes the full file path of new object(s) to a “new_files.txt” in S3
  • Use bash script on EC2 startup to execute a python script with the boto3 SDK to read from this designated “new_files.txt” (or any other logic via key paths based on timestamps, etc.) and GET from S3 programmatically.

Another option is to use AWS CLI commands through bash but this sounds potentially more tedious depending on what you are most comfortable with.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM