I am trying to automate data processing using AWS. I have setup an AWS lambda function in python that:
The problem is the aws cli call to sync s3 bucket with ec2 folder is not working, but when I manually ssh into the ec2 instance and runn the command it works.My aws-cli is configured with my access_keys and the ec2 has an s3 role that allows it full access.
import boto3
import time
import paramiko
def lambda_handler(event, context):
#create a low level client representing s3
s3 = boto3.client('s3')
ec2 = boto3.resource('ec2', region_name='eu-west-a')
instance_id = 'i-058456c79fjcde676'
instance = ec2.Instance(instance_id)
------------------------------------------------------'''
#start instance
instance.start()
#allow some time for the instance to start
time.sleep(30)
# Print few details of the instance
print("Instance id - ", instance.id)
print("Instance public IP - ", instance.public_ip_address)
print("Instance private IP - ", instance.private_ip_address)
print("Public dns name - ", instance.public_dns_name)
print("----------------------------------------------------")
print('Downloading pem file')
s3.download_file('some_bucket', 'some_pem_file.pem', '/tmp/some_pem_file.pem')
# Allowing few seconds for the download to complete
print('waiting for instance to start')
time.sleep(30)
print('sshing to instsnce')
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
privkey = paramiko.RSAKey.from_private_key_file('/tmp/some_pem_file.pem')
# username is most likely 'ec2-user' or 'root' or 'ubuntu'
# depending upon yor ec2 AMI
#s3_path = "s3://some_bucket/" + object_name
ssh.connect(
instance.public_dns_name, username='ubuntu', pkey=privkey)
print('inside machine...running commands')
stdin, stdout, stderr = ssh.exec_command('aws s3 sync s3://some_bucket/ ~/ec2_folder;\
bash ~/ec2_folder/unzip.sh; python3 ~/ec2_folder/process.py;')
stdin.flush()
data = stdout.read().splitlines()
for line in data:
print(line)
print('done, closing ssh session')
ssh.close()
# Stop the instance
instance.stop()
return('Triggered')
The use of an SSH tool is somewhat unusual.
Here are a few more 'cloud-friendly' options you might consider.
Systems Manager Run Command
The AWS Systems Manager Run Command allows you to execute a script on an Amazon EC2 instance (and, in fact, on any computer that is running the Systems Manager agent). It can even run the command on many (hundreds!) of instances/computers at the same time, keeping track of the success of each execution.
This means that, instead of connecting to the instance via SSH, the Lambda function could call the Run Command via an API call and Systems Manager would run the code on the instance.
Pull, Don't Push
Rather than 'pushing' the work to the instance, the instance could 'pull the work':
Trigger via HTTP
The instance could run a web server, listening for a message.
This answer is based on the additional information that you wish to shutdown the EC2 instance between executions .
I would recommend:
/var/lib/cloud/scripts/per-boot/
directory, which will cause it to run every time the instance is started ( every time, not just the first time)curl http://169.254.169.254/latest/user-data/
, so that it knows the filename from S3sudo shutdown now -h
to stop the instanceIf there is a chance that another file might come while the instance is already processing a file , then I would slightly change the process:
By the way, things can sometimes go wrong, so it's worth putting a 'circuit breaker' in the script so that it does not shutdown the instance if you want to debug things. This could be a matter of passing a flag, or even adding a tag to the instance, which is checked before calling the shutdown command.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.