So, I have a code which takes in input and starts a spark job in cluster.. So, something like
spark-submit driver.py -i input_path
Now, I have list of paths and I want to execute all these simulatenously..
Here is what I tried
base_command = 'spark-submit driver.py -i %s'
for path in paths:
command = base_command%path
subprocess.Popen(command, shell=True)
My hope was, all of the shell commands would be executed simultaneously but instead, I am noticing that it executes one command at a time..
How do i execute all the bash commands simultaneously. Thanks
This is where pool comes in, it is designed for just this case. It maps many inputs to many threads automatically. Here is a good resource on how to use it.
from multiprocessing import Pool
def run_command(path):
command = "spark-submit driver.py -i {}".format(path)
subprocess.Popen(command, shell=True)
pool = Pool()
pool.map(run_command, paths)
It will create a thread for every item in paths and, run them all at the same time for the given input
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.