I am attempting to run sets of simulations in parallel, and after each individual set, I need to run some commands to remove certain output files that were created from the simulations. Because the second group of commands require the output file to exist (it will always be output regardless if the simulation executed properly or not), it cannot be executed until all simulations of that set are finished. For reasons I explain later, the remove commands must be executed after its own set is done, and I cannot just execute all remove commands after all simulations are finished. This is all occurring on a Linux machine running Anaconda Python 3.6.3.
Here is a simplified version of my code:
from multiprocessing import Pool
import subprocess
def getCMDS(ARGS):
# do stuff
return allCMDs
def minions(cmd):
subprocess.run(cmd, shell=True)
return
def runParallel(entries):
for i in entries:
CMDs = getCMDS(i)
with Pool(processes=60) as pool: # multiprocessing.cpu_count() returns 72
for n, cmd_sets in enumerate(CMDs):
print('Command Set {} of {} for {}'.format(n+1, len(CMDs), i))
term_flag = False
try:
pool.map(minions,cmd_sets[0]) # exeCMDs
except KeyboardInterrupt:
pool.terminate()
pool.join()
term_flag = True
break
for cmd in cmd_sets[1]: # removeCMDs
subprocess.run(cmd, shell=True)
pool.close()
pool.join()
if term_flag:
break
return
if __name__ == '__main__':
list1 = ['A', 'B', 'C']
runParallel(list1)
To help clarify, allCMDs
is a nested list of command strings of the form:
allCMDs = [[exeCMDs, removeCMDs], # for set 1
[exeCMDs, removeCMDs], # for set 2
...] # ...
where exeCMDs
and removeCMDs
are their own lists of command strings.
What I expect to happen is:
list1
are run to completionlist1
While monitoring using Linux's htop
, what actually happens is for the first couple minutes everything looks as expected, but soon I start to see simulations from Set 2 entry A alongside simulations from Set 1 entry A. Eventually this continues on and I'll see a Set 5 entry A and so on join the list. This means pool.map()
is not waiting for its processes to finish before the script continues on. This also means the remove commands aren't actually doing anything because they execute before the file is created. This eventually leads to my machine running out of storage space as there are many simulations and these files that are meant for removal are quite large. As an added note which confuses me, I only ever see my print('Command Set...')
printed to the terminal the very first time. There are no errors raised from the script. I must manually stop the script using Keyboard Interrupt in order to kill everything after my storage becomes full, and delete and reset to try again.
What am I doing wrong?
Edit: I have solved my problem. I did not realize I made errors when creating allCMDs
in getCMDs()
which contained commands that shouldn't have been included. Below is my final working code:
from multiprocessing import Pool
import subprocess
def getCMDS(ARGS):
# do stuff
return allCMDs
def minions(cmd):
subprocess.run(cmd, shell=True)
return
def runParallel(entries):
for i in entries:
CMDs = getCMDS(i)
with Pool(processes=60) as pool:
for n, cmd_sets in enumerate(CMDs):
print('Command Set {} of {} for {}'.format(n+1, len(CMDs), i))
term_flag = False
try:
pool.map(minions,cmd_sets[0])
except KeyboardInterrupt:
pool.terminate()
pool.join()
term_flag = True
break
for cmd in cmd_sets[1]:
subprocess.run(cmd, shell=True)
if term_flag:
break
return
if __name__ == '__main__':
list1 = ['A', 'B', 'C']
runParallel(list1)
This is not so much an answer but rather some suggestions because I can't see anything obviously wrong with the logic except for a missing colon ( :
). It is possible that you may have simplified the code to the point that you have inadvertently removed the erroneous logic. If I might, however, offer a few suggestions:
You might consider using a multithreading pool instead of a multiprocessing pool only because you worker functions are doing nothing except calling subprocess.run
, which is in itself starting a new process. Should you also try to terminate the program by entering Ctrl-C, this will also prevent the pool processes from outputting messages (there is a solution to that also discusses later). I would also suggest that a slight re-arrangement of the code will result in your only having to create the pool once rather than repeatedly for each element of the entries
argument. This re-arrangement can also simplify the logic:
from multiprocessing.pool import ThreadPool
import subprocess
def getCMDS(ARGS):
# do stuff
return allCMDs
def minions(cmd):
subprocess.run(cmd, shell=True)
return
def runParallel(entries):
try:
# multiprocessing.cpu_count() returns 72
with ThreadPool(processes=60, initializer=init_pool_processors) as pool:
for i in entries:
CMDs = getCMDS(i)
for n, cmd_sets in enumerate(CMDs):
print('Command Set {} of {} for {}'.format(n+1, len(CMDs), i))
pool.map(minions, cmd_sets[0]) # exeCMDs
for cmd in cmd_sets[1]: # removeCMDs
subprocess.run(cmd, shell=True)
# An implicit pool.terminate() call is made
# when the with block is terminated.
except KeyboardInterrupt:
pass
if __name__ == '__main__':
list1 = ['A', 'B', 'C']
runParallel(list1)
If you stick with using a multiprocessing pool, then your pool processes should ignore any Ctrl-C interrupts. This can be done by defining an additional function, init_pool_processes
and specifying it as the inititialzer argument to the multiprocessing.pool.Pool
constructor:
from multiprocessing import Pool
def init_pool_processes():
import signal
signal.signal(signal.SIGINT, signal.SIG_IGN)
...
# multiprocessing.cpu_count() returns 72
with Pool(processes=60, initializer=init_pool_processors) as pool:
Is there any reason why you do not want to multithread/multiprocess the "remove" commands?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.