How to limit the workers' throughput in imap_unrdered?

Question

I'm using imap_unordered from the multiprocessing library to parallelize some data processing computations. The problem is that sometimes the master process that reads the returned iterator processes computed results slower than the workers produce them (network/disk speed limits etc.). This leads the program consuming all available memory and collapsing.

I'd expect the internal iterator to have some internal size limit, so that when the returned iterator is processed too slowly, the internal queue gets full and blocks the producers (asynchronous workers). But obviously this is not the case.

What would be the easiest way how to achieve such behavior?

Answer 1

You might want to consider using a Queue :

import multiprocessing  # Don't use queue.Queue!

MAX_QUEUE_SIZE = 20

q = multiprocessing.Queue(MAX_QUEUE_SIZE)  # Inserts will block if the queue is full

And then, in your master process:

while 1:
    do_something_with(q.get())

And in your children processes:

while 1:
    q.put(create_something())

You'll have to rewrite a bit of the machinery (ie you don't be able to use imap_unordered anymore), but that should be reasonably trivial using Pool 's lower level methods.

How to limit the workers' throughput in imap_unrdered?

Question

1 answers

solution1
0 2015-01-30 12:29:26

How to limit the workers' throughput in imap_unrdered?

Question

1 answers

solution1 0 2015-01-30 12:29:26

solution1
0 2015-01-30 12:29:26