简体   繁体   English

在多个 AWS 实例上运行并行 Python 代码

[英]Run parallel Python code on multiple AWS instances

I have a Python algorithm that can be parallelized fairly easily.我有一个 Python 算法,可以很容易地并行化。

I don't have the resources locally to run the whole thing in an acceptable time frame.我在本地没有资源在可接受的时间范围内运行整个事情。

For each work unit, I would like to be able to:对于每个工作单元,我希望能够:

  1. Launch an AWS instance (EC2?)启动 AWS 实例(EC2?)
  2. Send input data to the instance将输入数据发送到实例
  3. Run the Python code with the data as input以数据作为输入运行 Python 代码
  4. Return the result and aggregate it when all instances are done所有实例完成后返回结果并聚合

What is the best way to do this?做这个的最好方式是什么?

Is AWS Lambda used for this purpose? AWS Lambda 是否用于此目的? Can this be done only with Boto3?这只能用 Boto3 完成吗?

I am completely lost here.我完全迷失在这里。

Thank you谢谢

A common architecture for running tasks in parallel is:并行运行任务的常见架构是:

  • Put inputs into an Amazon SQS queue将输入放入Amazon SQS 队列
  • Run workers on multiple Amazon EC2 instances that:在多个 Amazon EC2 实例上运行工作程序,这些实例:
    • Retrieve a message from the SQS queue从 SQS 队列中检索消息
    • Process the data处理数据
    • Write results to Amazon S3将结果写入Amazon S3
    • Delete the message from the SQS queue (to signify that the job is complete)从 SQS 队列中删除消息(表示作业完成)

You can then retrieve all the results from Amazon S3.然后,您可以从 Amazon S3 检索所有结果。 Depending on their format, you could even use Amazon Athena to run SQL queries against all the output files simultaneously.根据它们的格式,您甚至可以使用Amazon Athena同时对所有 output 文件运行 SQL 查询。

You could even run multiple workers on the same instance if each worker is single-threaded and there is spare CPU available.如果每个工作人员都是单线程的并且有可用的空闲 CPU,您甚至可以在同一个实例上运行多个工作人员。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM