简体   繁体   English

教程:使用来自 S3 的数据使用 boto3 将 python 脚本提交到 EC2

[英]Tutorial: Submitting python script to EC2 using boto3 with data from S3

I am new to AWS and want to run a python work script that is embarrassingly parallel on an EC2-instance (eg c4.4xlarge).我是 AWS 的新手,想在 EC2 实例(例如 c4.4xlarge)上运行一个非常并行的 python工作脚本。

I have gone through questions on the topic, but have not found a high-level answer to the steps I need to take.我已经解决了有关该主题的问题,但没有找到我需要采取的步骤的高级答案。 I have AWS credentials and have boto3 installed on my laptop's python 2.我有 AWS 凭证并且在我的笔记本电脑的 python 2 上安装了 boto3。

How do I structure a python submission script that:我如何构建一个 python提交脚本:

  1. Connects to S3 where my python work script and dependencies are连接到我的 Python工作脚本和依赖项所在的 S3
  2. Launches and EC2 instance of a desired type启动所需类型的 EC2 实例
  3. Submits the python work script to be processed by the EC2 instance提交要由EC2实例处理的python工作脚本

In addition, within my python work script, how do I save the results of the work script back to S3?此外,在我的 python工作脚本中,如何将工作脚本的结果保存回 S3?

Finally, how do I ensure that the python version that I access via AWS has all the packages that are needed to successfully run my python work script?最后,如何确保我通过 AWS 访问的 python 版本具有成功运行我的 python工作脚本所需的所有包?

Sorry if the question is too high-level and for any conceptual mistakes.对不起,如果问题太高级并且有任何概念错误。 Thank you for any pointers!感谢您的任何指点!

To achieve this I would like to suggest more details to your current flow:为了实现这一点,我想为您当前的流程提供更多细节建议:

In the submission script:在提交脚本中:

  • Upload/Refresh any dependencies on the S3 bucket.上传/刷新 S3 存储桶上的任何依赖项。
  • Launch an EC2 instance.启动 EC2 实例。

In the EC2 instance:在 EC2 实例中:

  • Download dependencies.下载依赖。
  • Do work.做工作。
  • Upload the results to S3.将结果上传到 S3。
  • Terminate instance.终止实例。

There are 2 simple ways to run commands on an EC2 instance, SSH or use the user-data attribute.有两种简单的方法可以在 EC2 实例上运行命令,SSH 或使用 user-data 属性。 For simplicity, and for your current use case, I would recommend using the user-data method.为简单起见,对于您当前的用例,我建议使用 user-data 方法。

First, you need to create an EC2-InstanceProfile with permissions to download/upload to the S3 bucket.首先,您需要创建一个具有下载/上传到 S3 存储桶的权限的EC2-InstanceProfile Then you can create an EC2, install any python or pip packages and register it as an AMI .然后您可以创建一个 EC2,安装任何 python 或 pip 包并将其注册为AMI

Here is some reference code: Note this code is in python3 and suitable only for Windows machines.下面是一些参考代码: 注意这段代码是在 python3 中的,只适用于 Windows 机器。

submission.py:提交.py:

import boto3

s3_client = boto3.client('s3')
ec2 = boto3.resource('ec2')

deps = {
    'remote' : [
        "/path/to/s3-bucket/obj.txt"
    ],

    'local' : [
        "/path/to/local-directory/obj.txt"
    ]
}

for remote, local in zip(deps['remote'], deps['local']):
    s3_client.upload_file(local, bucket_name, remote)

user_data = f"""<powershell>
cd {path_to_instance_worker_dir}; python {path_to_instance_worker_script}
</powershell>
"""

instance = ec2.create_instances(
    MinCount=1,
    MaxCount=1,
    ImageId=image_id,
    InstanceType=your_ec2_type,

    KeyName=your_key_name,
    IamInstanceProfile={
            'Name': instance_profile_name
    },
    SecurityGroupIds=[
        instance_security_group,
    ],
    UserData=user_data
)

instance_worker:实例工作者:

import boto3

s3_client = boto3.client('s3')

deps = {
    'remote' : [
        "/path/to/s3-bucket/obj.txt"
    ],

    'local' : [
        "/path/to/local-directory/obj.txt"
    ]
}

for remote, local in zip(deps['remote'], deps['local']):
    s3_client.download_file(bucket_name, remote, local)

result = do_work()

# write results to file 

s3_client.upload_file(result_file, bucket_name, result_remote)

# Get the instance ID from inside (This is only for Windows machines)
p = subprocess.Popen(["powershell.exe", "(Invoke-WebRequest -Uri 'http://169.254.169.254/latest/meta-data/instance-id').Content"])
    out = p.communicate()[0]
    instance_id = str(out.strip().decode('ascii'))

ec2_client.terminate_instances(InstanceIds=[instance_id, ])

In this code, I terminate the instance from within, in order to do that you must first obtain the instnace_id, have a look here for more references.在这段代码中,我从内部终止了实例,为此您必须首先获取 instnace_id,请查看此处以获取更多参考。

Finally, how do I ensure that the python version that I access via AWS has all the packages that are needed to successfully run my python work script?最后,如何确保我通过 AWS 访问的 python 版本具有成功运行我的 python 工作脚本所需的所有包?

In theory, you can use the user data to run any scripts or CLI commands you would like, including installing python and pip dependencies, but if it's too complicated/heavy to install, I would suggest you build an image and launch from it, as mentioned before.理论上,您可以使用用户数据运行您想要的任何脚本或 CLI 命令,包括安装 python 和 pip 依赖项,但如果安装太复杂/繁重,我建议您构建一个映像并从中启动,如之前提到过。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM