简体   繁体   中英

Python wait for processes in multiprocessing Pool to complete without either closing Pool or use map()

I have a code piece like below

pool = multiprocessing.Pool(10)
for i in range(300):
    for m in range(500):
        data = do_some_calculation(resource)
        pool.apply_async(paralized_func, data, call_back=update_resource)
    # need to wait for all processes finish
    # {...}
    # Summarize resource
    do_something_with_resource(resource)

So basically I have 2 loops. I init process pool once outside the loops to avoid overheating. At the end of 2nd loop, I want to summarize the result of all processes.

Problem is that I can't use pool.map() to wait because of variation of data input. I can't use pool.join() and pool.close() either because I still need to use the pool in next iteration of 1st loop.

What is the good way to wait for processes to finish in this case?

I tried checking for pool._cache at the end of 2nd loop.

while len(process_pool._cache) > 0:
    sleep(0.001)

This way works but look weird. Is there a better way to do this?

apply_async will return an AsyncResult object. This object has a method wait([timeout]) , you can use it.

Example:

pool = multiprocessing.Pool(10)
for i in range(300):
    results = []
    for m in range(500):
        data = do_some_calculation(resource)
        result = pool.apply_async(paralized_func, data, call_back=update_resource)
        results.append(result)
    [result.wait() for result in results]
    # need to wait for all processes finish
    # {...}
    # Summarize resource
    do_something_with_resource(resource)

I haven't checked this code as it is not executable, but it should work.

Or you can use a callback to record how many returns you have got.

pool = multiprocessing.Pool(10)
for i in range(300):
    results = 0
    for m in range(500):
        data = do_some_calculation(resource)
        result = pool.apply_async(paralized_func, data, call_back=lambda x: results+=1; )
        results.append(result)
    
    # need to wait for all processes finish
    while results < 500:
        pass
    # Summarize resource
    do_something_with_resource(resource)

There's an issue with most upvoted answer

[result.wait() for result in results]

will not work as a roadblock in case some of the workers raised an exception. Exception considered sufficient case to proceed further for wait(). Here's possible check if all workers finished processing.

while True:
    time.sleep(1)
    # catch exception if results are not ready yet
    try:
        ready = [result.ready() for result in results]
        successful = [result.successful() for result in results]
    except Exception:
        continue
    # exit loop if all tasks returned success
    if all(successful):
        break
    # raise exception reporting exceptions received from workers
    if all(ready) and not all(successful):
        raise Exception(f'Workers raised following exceptions {[result._value for result in results if not result.successful()]}')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM