简体   繁体   English

Python MultiProcessing apply_async等待所有进程完成

[英]Python MultiProcessing apply_async wait for all processes to finish

I have been trying to manage a series of batch file processes in parallel while there are dependent groups of sub-processes. 我一直在尝试并行管理一系列批处理文件进程,同时有子进程的相关组。 What I am hoping to get is to be able to run all processes of group1 in parallel and then wait for all of them to finish before running group2 and so on. 我希望得到的是能够并行运行group1的所有进程,然后等待所有进程完成后再运行group2,依此类推。 Imagine a series of groups of process where each process is a separate existing batch file (batch_i.bat) 想象一下一系列的过程组,其中每个过程都是一个单独的现有批处理文件(batch_i.bat)

I have the following code based on my understanding of multiprocess module, so I expect when final print commands are called, all the log files are complete in term of having all the numbers printed. 基于对多进程模块的理解,我有以下代码,因此我希望在调用最终打印命令时,所有日志文件都是完整的,可以打印所有数字。 However, I notice the python code finishes successfully without the batch processes being completed. 但是,我注意到python代码成功完成而批处理未完成。

Python Code: Python代码:

import multiprocessing as mp
import subprocess

def worker(cmdlist, log):
    with open(log, 'w') as logfile:
        p = subprocess.Popen(cmdlist, stderr=logfile, stdout=logfile)
    # return p.returncode

# --------------------------------------------
# Main Process (Group 1)
# --------------------------------------------
if __name__ == '__main__':
    group1 = [batch_1 , batch_2 , batch_3 , ..., batch_10]
    group2 = [batch_11, batch_12, batch_13, ..., batch_20]
    group3 = [batch_21, batch_22, batch_23, ..., batch_30]

    # Multi-Core Exec
    all_process = group1 
    all_results = []
    pool = mp.Pool(processes=4)

    for myProcess in all_process:
        print("Starting Process: %s" %myProcess)
        log = os.path.splitext(myProcess)[0] + ".log"
        res = pool.apply_async(worker, args=[myProcess, log])
        all_results.append(res)

    pool.close()
    pool.join()
    print("All sub-processes completed")

    for res in all_results:
        res.get()
    print("All sub-processes completed: %s" % [res.successful() for res in all_results])

# --------------------------------------------
# call group 2 and wait for completion
# --------------------------------------------
....

# --------------------------------------------
# call group 3 and wait for completion
# --------------------------------------------
...

The rest of code calls all processes in group2 that are dependent on completion of group 1 and so on 其余代码将调用group2中的所有进程,这取决于组1的完成,依此类推。


Batch File : batch_i.bat: 批处理文件:batch_i.bat:

The batch file is a sample in this case and does nothing but print out a lot of numbers, I have the loops repeated a few times to ensure batch files takes long enough to finish. 在这种情况下,批处理文件只是一个示例,除了打印大量数字外什么也不做。我将循环重复了几次,以确保批处理文件花费足够的时间才能完成。

@echo off
echo Start of Loop

for /L %%n in (1,1,40000) do echo %%n
for /L %%n in (1,1,40000) do echo %%n
for /L %%n in (1,1,40000) do echo %%n
for /L %%n in (1,1,40000) do echo %%n

echo End of Loop

The output is as below: 输出如下:

> *** Running Base Cases: ***
>      on 4 CPUs Process: C:\Users\mamo8001\Project\Clustering\01 Codes\testNum.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum2.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum3.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum4.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum2.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum3.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum4.bat 
> All sub-processes completed 
> All sub-processes completed: [True, True, True, True, True, True, True,
> True]
> 
> Process finished with exit code 0

While the last two lines are printed, I notice the log files dont have complete list of numbers printed out, ie the batch prcoess is not finished 在打印最后两行时,我注意到日志文件没有打印出完整的数字列表,即批处理尚未完成

The issue is that your workers don't wait for their subprocesses to exit. 问题是您的工作人员不等待其子流程退出。 Add a p.wait() after the p = subprocess.Popen() call in the worker. 在工作p.wait()p = subprocess.Popen() p.wait()调用之后添加p.wait()

Using eight batch files with only one for loop to 40000 each I got the same results till I ran Popen as a context manager. 使用8个批处理文件,每个文件只有一个for循环到40000,在运行Popen作为上下文管理器之前,我得到了相同的结果。

def worker(cmdlist, log):
    with open(log, 'w') as logfile:
        with subprocess.Popen(cmdlist, stderr=logfile, stdout=logfile) as p:
            pass
    # return p.returncode

Then the final two print statements did not print until all the cmd windows closed. 然后,直到所有cmd窗口关闭,最后两个打印语句才打印。 Each log file had all the numbers as well as the Start/End of loop lines. 每个日志文件都具有所有编号以及循环行的开始/结束。

Used as a context manager the docs say that it waits till the process completes. 用作上下文管理器的文档说, 等待直到过程完成。

If you have Python 3.5+ the docs says to use subprocess.run() instead of Popen and the .run() docs says explicitly that it waits till the command completes - I couldn't test that, I have Python 3.4. 如果您使用的是Python 3.5+,则文档会说使用subprocess.run()而不是Popen ,而.run()文档会明确说它要等到命令完成后才能执行-我无法测试,我有Python 3.4。


Batch files were 批处理文件为

echo off

echo Start of Loop
for /L %%n in (1,1,40000) do echo %%n
echo End of Loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM