I am rewriting a threaded process into a multiprocessing queue to attempt to speed up a large calculation. I have gotten it 95% of the way there, but I can't figure out how to signal when the Queue
is empty using multiprocessing
.
My original code is something like this:
import Queue
from threading import Thread
num_fetch_threads = 4
enclosure_queue = Queue()
for i in range(num_fetch_threads):
worker = Thread(target=run_experiment, args=(i, enclosure_queue))
worker.setDaemon(True)
worker.start()
for experiment in experiment_collection:
enclosure_queue.put((experiment, otherVar))
enclosure_queue.join()
And the queue function like this:
def run_experiment(i, q):
while True:
... do stuff ...
q.task_done()
My new code is somethings like this:
from multiprocessing import Process, Queue
num_fetch_threads = 4
enclosure_queue = Queue()
for i in range(num_fetch_threads):
worker = Process(target=run_experiment, args=(i, enclosure_queue))
worker.daemon = True
worker.start()
for experiment in experiment_collection:
enclosure_queue.put((experiment, otherVar))
worker.join() ## I only put this here bc enclosure_queue.join() is not available
And the new queue function:
def run_experiment(i, q):
while True:
... do stuff ...
## not sure what should go here
I have been reading the docs and Google, but can't figure out what I am missing - I know that task_done
/ join
are not part of the multiprocessing
Queue
class, but it's not clear what I am supposed to use.
"They differ in that Queue lacks the task_done() and join() methods introduced into Python 2.5's Queue.Queue class." Source
But without either of those, I'm not sure how the queue knows it is done, and how to continue on with the program.
Consider using a multiprocessing.Pool
instead of managing workers manually. Pool handles dispatching tasks to workers, with convenient functions like map and apply, and supports .close
and .join
methods. Pool
takes care of handling the queues between processes and processing the results. Here's how your code might look like using multiprocessing.Pool
:
from multiprocessing import Pool
def do_experiment(exp):
# run the experiment `exp`, will be called by `p.map`
return result
p = Pool() # automatically scales to the number of CPUs available
results = p.map(do_experiment, experiment_collection)
p.close()
p.join()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.