I'm using imap_unordered
from the multiprocessing library to parallelize some data processing computations. The problem is that sometimes the master process that reads the returned iterator processes computed results slower than the workers produce them (network/disk speed limits etc.). This leads the program consuming all available memory and collapsing.
I'd expect the internal iterator to have some internal size limit, so that when the returned iterator is processed too slowly, the internal queue gets full and blocks the producers (asynchronous workers). But obviously this is not the case.
What would be the easiest way how to achieve such behavior?
You might want to consider using a Queue
:
import multiprocessing # Don't use queue.Queue!
MAX_QUEUE_SIZE = 20
q = multiprocessing.Queue(MAX_QUEUE_SIZE) # Inserts will block if the queue is full
And then, in your master process:
while 1:
do_something_with(q.get())
And in your children processes:
while 1:
q.put(create_something())
You'll have to rewrite a bit of the machinery (ie you don't be able to use imap_unordered
anymore), but that should be reasonably trivial using Pool
's lower level methods.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.