Tutorial: Submitting python script to EC2 using boto3 with data from S3

Question

I am new to AWS and want to run a python work script that is embarrassingly parallel on an EC2-instance (eg c4.4xlarge).

I have gone through questions on the topic, but have not found a high-level answer to the steps I need to take. I have AWS credentials and have boto3 installed on my laptop's python 2.

How do I structure a python submission script that:

Connects to S3 where my python work script and dependencies are
Launches and EC2 instance of a desired type
Submits the python work script to be processed by the EC2 instance

In addition, within my python work script, how do I save the results of the work script back to S3?

Finally, how do I ensure that the python version that I access via AWS has all the packages that are needed to successfully run my python work script?

Sorry if the question is too high-level and for any conceptual mistakes. Thank you for any pointers!

Answer 1

To achieve this I would like to suggest more details to your current flow:

In the submission script:

Upload/Refresh any dependencies on the S3 bucket.
Launch an EC2 instance.

In the EC2 instance:

Download dependencies.
Do work.
Upload the results to S3.
Terminate instance.

There are 2 simple ways to run commands on an EC2 instance, SSH or use the user-data attribute. For simplicity, and for your current use case, I would recommend using the user-data method.

First, you need to create an EC2-InstanceProfile with permissions to download/upload to the S3 bucket. Then you can create an EC2, install any python or pip packages and register it as an AMI .

Here is some reference code: Note this code is in python3 and suitable only for Windows machines.

submission.py:

import boto3

s3_client = boto3.client('s3')
ec2 = boto3.resource('ec2')

deps = {
    'remote' : [
        "/path/to/s3-bucket/obj.txt"
    ],

    'local' : [
        "/path/to/local-directory/obj.txt"
    ]
}

for remote, local in zip(deps['remote'], deps['local']):
    s3_client.upload_file(local, bucket_name, remote)

user_data = f"""<powershell>
cd {path_to_instance_worker_dir}; python {path_to_instance_worker_script}
</powershell>
"""

instance = ec2.create_instances(
    MinCount=1,
    MaxCount=1,
    ImageId=image_id,
    InstanceType=your_ec2_type,

    KeyName=your_key_name,
    IamInstanceProfile={
            'Name': instance_profile_name
    },
    SecurityGroupIds=[
        instance_security_group,
    ],
    UserData=user_data
)

instance_worker:

import boto3

s3_client = boto3.client('s3')

deps = {
    'remote' : [
        "/path/to/s3-bucket/obj.txt"
    ],

    'local' : [
        "/path/to/local-directory/obj.txt"
    ]
}

for remote, local in zip(deps['remote'], deps['local']):
    s3_client.download_file(bucket_name, remote, local)

result = do_work()

# write results to file 

s3_client.upload_file(result_file, bucket_name, result_remote)

# Get the instance ID from inside (This is only for Windows machines)
p = subprocess.Popen(["powershell.exe", "(Invoke-WebRequest -Uri 'http://169.254.169.254/latest/meta-data/instance-id').Content"])
    out = p.communicate()[0]
    instance_id = str(out.strip().decode('ascii'))

ec2_client.terminate_instances(InstanceIds=[instance_id, ])

In this code, I terminate the instance from within, in order to do that you must first obtain the instnace_id, have a look here for more references.

Finally, how do I ensure that the python version that I access via AWS has all the packages that are needed to successfully run my python work script?

In theory, you can use the user data to run any scripts or CLI commands you would like, including installing python and pip dependencies, but if it's too complicated/heavy to install, I would suggest you build an image and launch from it, as mentioned before.

Tutorial: Submitting python script to EC2 using boto3 with data from S3

Question

1 answers

solution1
0 2021-08-04 12:53:05

Tutorial: Submitting python script to EC2 using boto3 with data from S3

Question

1 answers

solution1 0 2021-08-04 12:53:05

solution1
0 2021-08-04 12:53:05