简体   繁体   English

python 多处理池:我怎么知道池中的所有工人何时完成?

[英]python multiprocessing pool: how can I know when all the workers in the pool have finished?

I am running a multiprocessing pool in python, where I have ~2000 tasks, being mapped to 24 workers with the pool.我在 python 中运行一个多处理池,我有大约 2000 个任务,通过池映射到 24 个工作人员。 each task creates a file based on some data analysis and webservices.每个任务根据一些数据分析和网络服务创建一个文件。

I want to run a new task, when all the tasks in the pool were finished.当池中的所有任务都完成后,我想运行一个新任务。 how can I tell when all the processes in the pool have finished?我如何知道池中的所有进程何时完成?

You want to use the join method , which halts the main process thread from moving forward until all sub-processes ends: 您希望使用join方法 ,该方法会停止主进程线程向前移动,直到所有子进程结束:

Block the calling thread until the process whose join() method is called terminates or until the optional timeout occurs. 阻止调用线程,直到调用其join()方法的进程终止或直到发生可选超时。

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    processes = []
    for i in range(10):
        p = Process(target=f, args=('bob',))
        processes.append(p)

    for p in processes:
        p.start()
        p.join()

     # only get here once all processes have finished.
     print('finished!')

EDIT: 编辑:

To use join with pools 要使用join与池

    pool = Pool(processes=4)  # start 4 worker processes
    result = pool.apply_async(f, (10,))  # do some work
    pool.close()
    pool.join()  # block at this line until all processes are done
    print("completed")

You can use the wait() method of the AsyncResult object (which is what apply_async returns).您可以使用AsyncResult对象的wait()方法(这是apply_async返回的内容)。

import multiprocessing

def create_file(i):
    open(f'{i}.txt', 'a').close()

if __name__ == '__main__':
    # The default for n_processes is the detected number of CPUs
    with multiprocessing.Pool() as pool:

        # Launch the first round of tasks, building a list of AsyncResult objects
        results = [pool.apply_async(create_file, (i,)) for i in range(50)]
    
        # Wait for every task to finish
        [result.wait() for result in results]

        # {start your next task... the pool is still available}

    # {when you reach here, the pool is closed}

This method works even if you're planning on using your pool again and don't want to close it, as @dano pointed out might be the case.即使您计划再次使用您的池并且不想关闭它,此方法也有效,正如@dano 指出的那样。 For example, you might need to keep it around for the next iteration of an algorithm.例如,您可能需要为算法的下一次迭代保留它。 Just be sure to close it when you're done.完成后一定要关闭它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM