Python Multiprocessing - 'Queue' object has no attribute 'task_done' / 'join'

Question

I am rewriting a threaded process into a multiprocessing queue to attempt to speed up a large calculation. I have gotten it 95% of the way there, but I can't figure out how to signal when the Queue is empty using multiprocessing .

My original code is something like this:

import Queue
from threading import Thread

num_fetch_threads = 4
enclosure_queue = Queue()

for i in range(num_fetch_threads):
  worker = Thread(target=run_experiment, args=(i, enclosure_queue))
  worker.setDaemon(True)
  worker.start()

for experiment in experiment_collection:
  enclosure_queue.put((experiment, otherVar))

enclosure_queue.join()

And the queue function like this:

def run_experiment(i, q):
  while True:
    ... do stuff ...
    q.task_done()

My new code is somethings like this:

from multiprocessing import Process, Queue

num_fetch_threads = 4
enclosure_queue = Queue()

for i in range(num_fetch_threads):
  worker = Process(target=run_experiment, args=(i, enclosure_queue))
  worker.daemon = True
  worker.start()

for experiment in experiment_collection:
  enclosure_queue.put((experiment, otherVar))

worker.join() ## I only put this here bc enclosure_queue.join() is not available

And the new queue function:

def run_experiment(i, q):
  while True:
    ... do stuff ...
    ## not sure what should go here

I have been reading the docs and Google, but can't figure out what I am missing - I know that task_done / join are not part of the multiprocessing Queue class, but it's not clear what I am supposed to use.

"They differ in that Queue lacks the task_done() and join() methods introduced into Python 2.5's Queue.Queue class." Source

But without either of those, I'm not sure how the queue knows it is done, and how to continue on with the program.

Answer 1

Consider using a multiprocessing.Pool instead of managing workers manually. Pool handles dispatching tasks to workers, with convenient functions like map and apply, and supports .close and .join methods. Pool takes care of handling the queues between processes and processing the results. Here's how your code might look like using multiprocessing.Pool :

from multiprocessing import Pool

def do_experiment(exp):
    # run the experiment `exp`, will be called by `p.map`
    return result

p = Pool() # automatically scales to the number of CPUs available

results = p.map(do_experiment, experiment_collection)
p.close()
p.join()

Python Multiprocessing - 'Queue' object has no attribute 'task_done' / 'join'

Question

1 answers

solution1
2 ACCPTED 2016-04-06 19:09:13

Python Multiprocessing - 'Queue' object has no attribute 'task_done' / 'join'

Question

1 answers

solution1 2 ACCPTED 2016-04-06 19:09:13

solution1
2 ACCPTED 2016-04-06 19:09:13