简体   繁体   English

“多处理”会自动关闭完成的子进程吗?

[英]will 'multiprocessing' automatically close finished child processes?

I used the multiprocessing lib to create multi-thread to process a list of files(20+ files). 我使用multiprocessing lib创建了多线程来处理文件列表(20多个文件)。

When I run the py file, I set the pool number as 4. But in cmd, it showed there are over 10 processes. 运行py文件时,将池号设置为4。但是在cmd中,它显示有10个以上的进程。 And most of them have been running for a long time. 而且大多数已经运行了很长时间。 Because it's large file and takes long time to process so I'm not sure if the process is hanging or still executing. 因为文件很大,并且处理时间很长,所以我不确定该进程是挂起还是仍在执行。

So my question is: 所以我的问题是:

if it's executing, how to set the process number as exactly 4? 如果正在执行,如何将进程号设置为正好4?

if it's hanging, it means child process will not shut down after finished. 如果挂起,则表示子进程完成后不会关闭。 Can I set it automatically shutting down after finished? 完成后可以自动关闭吗?

from multiprocessing import Pool
poolNum = int(sys.argv[1])
pool = Pool(poolNum)
pool.map(processFunc, fileList)

It won't, not until the Pool is close -ed or terminate -ed (IIRC Pool s at least at present have a reference cycle involved, so even when the last live reference to the Pool goes away, the Pool is not deterministically collected, even on CPython, which uses reference counting and normally has deterministic behavior). 直到closeterminatePool ,它才会(IIRC Pool至少目前涉及一个引用周期,因此,即使最后一次对Pool实时引用消失了,也无法确定地收集Pool ,即使在使用引用计数且通常具有确定性行为的CPython上也是如此)。

Since you're using map , your work is definitely done when map returns, so the simplest solution is just to use a with statement for guaranteed termination: 由于您正在使用map ,因此当map返回时,您的工作肯定已经完成,因此最简单的解决方案是使用with语句来保证终止:

from multiprocessing import Pool

def main():
    poolNum = int(sys.argv[1])

    with Pool(poolNum) as pool:  # Pool created
        pool.map(processFunc, fileList)
    # terminate has been called, all workers will be killed

# Adding main guard so this code is valid on Windows and anywhere else which
# doesn't use forking for whatever reason
if __name__ == '__main__':
    main()

As I commented, I used a main function with the standard guard against being invoked on import , as Windows simulates forking by reimporting the main module (but not naming it __main__ ); 正如我所评论的那样,由于Windows通过重新导入主模块来模拟派生(但未命名为__main__ ),因此我将main函数与标准防护一起使用,以防止在import上被调用。 without the guard, you can end up with the child process creating new processes automatically, which is problematic. 没有警卫,您可能最终导致子进程自动创建新进程,这是有问题的。

Side-note: If you are dispatching a bunch of tasks but not waiting on them immediately (so you don't want to terminate the pool anywhere near when you create it, but want to ensure the workers are cleaned up promptly), you can still use context management to help out. 旁注:如果您要分派一堆任务但不立即等待它们(因此,您不想在创建池时在附近的任何位置终止池,而是要确保及时清理工作人员),则可以仍然使用上下文管理来提供帮助。 Just use contextlib.closing to close the pool once all the tasks are dispatched; 调度所有任务后,只需使用contextlib.closing close池即可; you must dispatch all the tasks before the end of the with block, but you can retrieve the results later, and when all results are computed, the child processes will close. 您必须在with块结束之前分派所有任务,但是您可以稍后检索结果,并且在计算所有结果时,子进程将关闭。 For example: 例如:

from contextlib import closing
from multiprocessing import Pool

def main():
    poolNum = int(sys.argv[1])

    with closing(Pool(poolNum)) as pool:  # Pool created
        results = pool.imap_unordered(processFunc, fileList)
    # close has been called, so no new work can be submitted,
    # and when all outstanding tasks complete, the workers will exit
    # immediately/cleanly

    for res in results:
        # Can still retrieve results even after pool is closed

# Adding main guard so this code is valid on Windows and anywhere else which
# doesn't use forking for whatever reason
if __name__ == '__main__':
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM