简体繁体中英

Python multiprocessing BETWEEN Amazon cloud instances

原文 2011-06-23 16:00:35 1 3 python/ amazon-ec2/ multiprocessing/ python-multithreading

I'm looking to run a long-running python analysis process on a few Amazon EC2 instances. The code already runs using the python multiprocessing module and can take advantage of all cores on a single machine.

The analysis is completely parellel and each instance does not need to communicate with any of the others. All of the work is "file-based" and each process works on each file indivually ... so I was planning on just mounting the same S3 volume across all of the nodes.

I was wondering if anyone knew of any tutorials (or had any suggestions) for setting up the multiprocessing environment so I can run it on an arbitrary number of compute-instances at the same time.

3 answers

the docs give you a good setup for running multiprocessing on multiple machines . Using s3 is a good way to share files across ec2 instances, but with multiprocessing you can share queues and pass data.

if you can use hadoop for parallel tasks, it is a very good way to extract parallelism across machines, but if you need a lot of IPC then building your own solution with multiprocessing isn't that bad.

just make sure you put your machines in the same security groups :-)

I would use dumbo . It is a python wrapper for Hadoop that is compatible with Amazon Elastic MapReduce. Write a little wrapper around your code to integrate with dumbo. Note that you probably need a map-only job with no reduce step.

I've been digging into IPython recently, and it looks like it supports parallel processing accross multiple hosts right out of the box:

http://ipython.org/ipython-doc/stable/html/parallel/index.html

Share same multiprocessing.Pool object between different python instances

Python multiprocessing communication with SocketServer instances

How to share a global dict between Cloud Run instances in python?

Sharing an instance variable in Python Multiprocessing - TypeError: '<' not supported between instances of 'int' and 'ListProxy'

How to use multiprocessing with class instances in Python?

running multiple instances of pyglet with multiprocessing library in python

Multiprocessing launching too many instances of Python VM

Communicating between Processes: Python Multiprocessing

Doing task between Multiprocessing python

Pass method return values between multiprocessing.Process instances

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Share same multiprocessing.Pool object between different python instances Python multiprocessing communication with SocketServer instances How to share a global dict between Cloud Run instances in python? Sharing an instance variable in Python Multiprocessing - TypeError: '<' not supported between instances of 'int' and 'ListProxy' How to use multiprocessing with class instances in Python? running multiple instances of pyglet with multiprocessing library in python Multiprocessing launching too many instances of Python VM Communicating between Processes: Python Multiprocessing Doing task between Multiprocessing python Pass method return values between multiprocessing.Process instances

Related Tags

Python multiprocessing BETWEEN Amazon cloud instances

Question

3 answers

solution1
4 ACCPTED 2011-06-27 02:01:55

solution2
0 2011-06-23 20:56:45

solution3
0 2011-06-28 09:26:53

Python multiprocessing BETWEEN Amazon cloud instances

Question

3 answers

solution1 4 ACCPTED 2011-06-27 02:01:55

solution2 0 2011-06-23 20:56:45

solution3 0 2011-06-28 09:26:53

solution1
4 ACCPTED 2011-06-27 02:01:55

solution2
0 2011-06-23 20:56:45

solution3
0 2011-06-28 09:26:53