Python multiprocessing.pool not waiting for processes to finish?

Question

I am attempting to run sets of simulations in parallel, and after each individual set, I need to run some commands to remove certain output files that were created from the simulations. Because the second group of commands require the output file to exist (it will always be output regardless if the simulation executed properly or not), it cannot be executed until all simulations of that set are finished. For reasons I explain later, the remove commands must be executed after its own set is done, and I cannot just execute all remove commands after all simulations are finished. This is all occurring on a Linux machine running Anaconda Python 3.6.3.

Here is a simplified version of my code:

from multiprocessing import Pool
import subprocess

def getCMDS(ARGS):
    # do stuff
    return allCMDs

def minions(cmd):
    subprocess.run(cmd, shell=True)
    return

def runParallel(entries):
    for i in entries:
        CMDs = getCMDS(i)
        with Pool(processes=60) as pool: # multiprocessing.cpu_count() returns 72
            for n, cmd_sets in enumerate(CMDs):
                print('Command Set {} of {} for {}'.format(n+1, len(CMDs), i))
                term_flag = False
                try:
                    pool.map(minions,cmd_sets[0]) # exeCMDs
                except KeyboardInterrupt:
                    pool.terminate()
                    pool.join()
                    term_flag = True
                    break
                for cmd in cmd_sets[1]: # removeCMDs
                    subprocess.run(cmd, shell=True)
            pool.close()
            pool.join()
        if term_flag:
            break
    return

if __name__ == '__main__':
    list1 = ['A', 'B', 'C']
    runParallel(list1)

To help clarify, allCMDs is a nested list of command strings of the form:

allCMDs = [[exeCMDs, removeCMDs],  # for set 1
          [exeCMDs, removeCMDs],   # for set 2
          ...]  # ...

where exeCMDs and removeCMDs are their own lists of command strings.

What I expect to happen is:

All simulations associated with Set 1 pertaining to 'A' in list1 are run to completion
The remove commands for Set 1 entry A are executed
All simulations for Set 2 entry A are run to completion
The remove commands for Set 2 entry A are executed
... Loop through all sets, then loop through all entries from list1
Complete

While monitoring using Linux's htop , what actually happens is for the first couple minutes everything looks as expected, but soon I start to see simulations from Set 2 entry A alongside simulations from Set 1 entry A. Eventually this continues on and I'll see a Set 5 entry A and so on join the list. This means pool.map() is not waiting for its processes to finish before the script continues on. This also means the remove commands aren't actually doing anything because they execute before the file is created. This eventually leads to my machine running out of storage space as there are many simulations and these files that are meant for removal are quite large. As an added note which confuses me, I only ever see my print('Command Set...') printed to the terminal the very first time. There are no errors raised from the script. I must manually stop the script using Keyboard Interrupt in order to kill everything after my storage becomes full, and delete and reset to try again.

What am I doing wrong?

Edit: I have solved my problem. I did not realize I made errors when creating allCMDs in getCMDs() which contained commands that shouldn't have been included. Below is my final working code:

from multiprocessing import Pool
import subprocess

def getCMDS(ARGS):
    # do stuff
    return allCMDs

def minions(cmd):
    subprocess.run(cmd, shell=True)
    return

def runParallel(entries):
    for i in entries:
        CMDs = getCMDS(i)
        with Pool(processes=60) as pool:
            for n, cmd_sets in enumerate(CMDs):
                print('Command Set {} of {} for {}'.format(n+1, len(CMDs), i))
                term_flag = False
                try:
                    pool.map(minions,cmd_sets[0])
                except KeyboardInterrupt:
                    pool.terminate()
                    pool.join()
                    term_flag = True
                    break
                for cmd in cmd_sets[1]:
                    subprocess.run(cmd, shell=True)
        if term_flag:
            break
    return

if __name__ == '__main__':
    list1 = ['A', 'B', 'C']
    runParallel(list1)

Answer 1

This is not so much an answer but rather some suggestions because I can't see anything obviously wrong with the logic except for a missing colon ( : ). It is possible that you may have simplified the code to the point that you have inadvertently removed the erroneous logic. If I might, however, offer a few suggestions:

You might consider using a multithreading pool instead of a multiprocessing pool only because you worker functions are doing nothing except calling subprocess.run , which is in itself starting a new process. Should you also try to terminate the program by entering Ctrl-C, this will also prevent the pool processes from outputting messages (there is a solution to that also discusses later). I would also suggest that a slight re-arrangement of the code will result in your only having to create the pool once rather than repeatedly for each element of the entries argument. This re-arrangement can also simplify the logic:

from multiprocessing.pool import ThreadPool
import subprocess


def getCMDS(ARGS):
    # do stuff
    return allCMDs

def minions(cmd):
    subprocess.run(cmd, shell=True)
    return

def runParallel(entries):
    try:
        # multiprocessing.cpu_count() returns 72
        with ThreadPool(processes=60, initializer=init_pool_processors) as pool:
            for i in entries:
                CMDs = getCMDS(i)
                for n, cmd_sets in enumerate(CMDs):
                    print('Command Set {} of {} for {}'.format(n+1, len(CMDs), i))
                    pool.map(minions, cmd_sets[0]) # exeCMDs
                    for cmd in cmd_sets[1]: # removeCMDs
                        subprocess.run(cmd, shell=True)
        # An implicit pool.terminate() call is made
        # when the with block is terminated.
    except KeyboardInterrupt:
        pass

if __name__ == '__main__':
    list1 = ['A', 'B', 'C']
    runParallel(list1)

If you stick with using a multiprocessing pool, then your pool processes should ignore any Ctrl-C interrupts. This can be done by defining an additional function, init_pool_processes and specifying it as the inititialzer argument to the multiprocessing.pool.Pool constructor:

from multiprocessing import Pool

def init_pool_processes():
    import signal
    signal.signal(signal.SIGINT, signal.SIG_IGN)

...

        # multiprocessing.cpu_count() returns 72
        with Pool(processes=60, initializer=init_pool_processors) as pool:

Is there any reason why you do not want to multithread/multiprocess the "remove" commands?

Python multiprocessing.pool not waiting for processes to finish?

Question

1 answers

solution1
0 2022-03-28 22:04:07

Python multiprocessing.pool not waiting for processes to finish?

Question

1 answers

solution1 0 2022-03-28 22:04:07

solution1
0 2022-03-28 22:04:07