简体   繁体   中英

Python MultiProcessing apply_async wait for all processes to finish

I have been trying to manage a series of batch file processes in parallel while there are dependent groups of sub-processes. What I am hoping to get is to be able to run all processes of group1 in parallel and then wait for all of them to finish before running group2 and so on. Imagine a series of groups of process where each process is a separate existing batch file (batch_i.bat)

I have the following code based on my understanding of multiprocess module, so I expect when final print commands are called, all the log files are complete in term of having all the numbers printed. However, I notice the python code finishes successfully without the batch processes being completed.

Python Code:

import multiprocessing as mp
import subprocess

def worker(cmdlist, log):
    with open(log, 'w') as logfile:
        p = subprocess.Popen(cmdlist, stderr=logfile, stdout=logfile)
    # return p.returncode

# --------------------------------------------
# Main Process (Group 1)
# --------------------------------------------
if __name__ == '__main__':
    group1 = [batch_1 , batch_2 , batch_3 , ..., batch_10]
    group2 = [batch_11, batch_12, batch_13, ..., batch_20]
    group3 = [batch_21, batch_22, batch_23, ..., batch_30]

    # Multi-Core Exec
    all_process = group1 
    all_results = []
    pool = mp.Pool(processes=4)

    for myProcess in all_process:
        print("Starting Process: %s" %myProcess)
        log = os.path.splitext(myProcess)[0] + ".log"
        res = pool.apply_async(worker, args=[myProcess, log])
        all_results.append(res)

    pool.close()
    pool.join()
    print("All sub-processes completed")

    for res in all_results:
        res.get()
    print("All sub-processes completed: %s" % [res.successful() for res in all_results])

# --------------------------------------------
# call group 2 and wait for completion
# --------------------------------------------
....

# --------------------------------------------
# call group 3 and wait for completion
# --------------------------------------------
...

The rest of code calls all processes in group2 that are dependent on completion of group 1 and so on


Batch File : batch_i.bat:

The batch file is a sample in this case and does nothing but print out a lot of numbers, I have the loops repeated a few times to ensure batch files takes long enough to finish.

@echo off
echo Start of Loop

for /L %%n in (1,1,40000) do echo %%n
for /L %%n in (1,1,40000) do echo %%n
for /L %%n in (1,1,40000) do echo %%n
for /L %%n in (1,1,40000) do echo %%n

echo End of Loop

The output is as below:

> *** Running Base Cases: ***
>      on 4 CPUs Process: C:\Users\mamo8001\Project\Clustering\01 Codes\testNum.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum2.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum3.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum4.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum2.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum3.bat Process: C:\Users\mamo8001\Project\Clustering\01
> Codes\testNum4.bat 
> All sub-processes completed 
> All sub-processes completed: [True, True, True, True, True, True, True,
> True]
> 
> Process finished with exit code 0

While the last two lines are printed, I notice the log files dont have complete list of numbers printed out, ie the batch prcoess is not finished

The issue is that your workers don't wait for their subprocesses to exit. Add a p.wait() after the p = subprocess.Popen() call in the worker.

Using eight batch files with only one for loop to 40000 each I got the same results till I ran Popen as a context manager.

def worker(cmdlist, log):
    with open(log, 'w') as logfile:
        with subprocess.Popen(cmdlist, stderr=logfile, stdout=logfile) as p:
            pass
    # return p.returncode

Then the final two print statements did not print until all the cmd windows closed. Each log file had all the numbers as well as the Start/End of loop lines.

Used as a context manager the docs say that it waits till the process completes.

If you have Python 3.5+ the docs says to use subprocess.run() instead of Popen and the .run() docs says explicitly that it waits till the command completes - I couldn't test that, I have Python 3.4.


Batch files were

echo off

echo Start of Loop
for /L %%n in (1,1,40000) do echo %%n
echo End of Loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM