I am new to AWS and want to run a python work script that is embarrassingly parallel on an EC2-instance (eg c4.4xlarge).
I have gone through questions on the topic, but have not found a high-level answer to the steps I need to take. I have AWS credentials and have boto3 installed on my laptop's python 2.
How do I structure a python submission script that:
In addition, within my python work script, how do I save the results of the work script back to S3?
Finally, how do I ensure that the python version that I access via AWS has all the packages that are needed to successfully run my python work script?
Sorry if the question is too high-level and for any conceptual mistakes. Thank you for any pointers!
To achieve this I would like to suggest more details to your current flow:
In the submission script:
In the EC2 instance:
There are 2 simple ways to run commands on an EC2 instance, SSH or use the user-data attribute. For simplicity, and for your current use case, I would recommend using the user-data method.
First, you need to create an EC2-InstanceProfile with permissions to download/upload to the S3 bucket. Then you can create an EC2, install any python or pip packages and register it as an AMI .
Here is some reference code: Note this code is in python3 and suitable only for Windows machines.
submission.py:
import boto3
s3_client = boto3.client('s3')
ec2 = boto3.resource('ec2')
deps = {
'remote' : [
"/path/to/s3-bucket/obj.txt"
],
'local' : [
"/path/to/local-directory/obj.txt"
]
}
for remote, local in zip(deps['remote'], deps['local']):
s3_client.upload_file(local, bucket_name, remote)
user_data = f"""<powershell>
cd {path_to_instance_worker_dir}; python {path_to_instance_worker_script}
</powershell>
"""
instance = ec2.create_instances(
MinCount=1,
MaxCount=1,
ImageId=image_id,
InstanceType=your_ec2_type,
KeyName=your_key_name,
IamInstanceProfile={
'Name': instance_profile_name
},
SecurityGroupIds=[
instance_security_group,
],
UserData=user_data
)
instance_worker:
import boto3
s3_client = boto3.client('s3')
deps = {
'remote' : [
"/path/to/s3-bucket/obj.txt"
],
'local' : [
"/path/to/local-directory/obj.txt"
]
}
for remote, local in zip(deps['remote'], deps['local']):
s3_client.download_file(bucket_name, remote, local)
result = do_work()
# write results to file
s3_client.upload_file(result_file, bucket_name, result_remote)
# Get the instance ID from inside (This is only for Windows machines)
p = subprocess.Popen(["powershell.exe", "(Invoke-WebRequest -Uri 'http://169.254.169.254/latest/meta-data/instance-id').Content"])
out = p.communicate()[0]
instance_id = str(out.strip().decode('ascii'))
ec2_client.terminate_instances(InstanceIds=[instance_id, ])
In this code, I terminate the instance from within, in order to do that you must first obtain the instnace_id, have a look here for more references.
Finally, how do I ensure that the python version that I access via AWS has all the packages that are needed to successfully run my python work script?
In theory, you can use the user data to run any scripts or CLI commands you would like, including installing python and pip dependencies, but if it's too complicated/heavy to install, I would suggest you build an image and launch from it, as mentioned before.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.