简体   繁体   中英

How to limit the workers' throughput in imap_unrdered?

I'm using imap_unordered from the multiprocessing library to parallelize some data processing computations. The problem is that sometimes the master process that reads the returned iterator processes computed results slower than the workers produce them (network/disk speed limits etc.). This leads the program consuming all available memory and collapsing.

I'd expect the internal iterator to have some internal size limit, so that when the returned iterator is processed too slowly, the internal queue gets full and blocks the producers (asynchronous workers). But obviously this is not the case.

What would be the easiest way how to achieve such behavior?

You might want to consider using a Queue :

import multiprocessing  # Don't use queue.Queue!

MAX_QUEUE_SIZE = 20

q = multiprocessing.Queue(MAX_QUEUE_SIZE)  # Inserts will block if the queue is full

And then, in your master process:

while 1:
    do_something_with(q.get())

And in your children processes:

while 1:
    q.put(create_something())

You'll have to rewrite a bit of the machinery (ie you don't be able to use imap_unordered anymore), but that should be reasonably trivial using Pool 's lower level methods.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM