简体   繁体   中英

How to call a linux command line program in parallel with python

I have a command-line program which runs on single core. It takes an input file, does some calculations, and returns several files which I need to parse to store the produced output. I have to call the program several times changing the input file. To speed up the things I was thinking parallelization would be useful. Until now I have performed this task calling every run separately within a loop with the subprocess module.

I wrote a script which creates a new working folder on every run and than calls the execution of the program whose output is directed to that folder and returns some data which I need to store. My question is, how can I adapt the following code, found here , to execute my script always using the indicated amount of CPUs, and storing the output. Note that each run has a unique running time. Here the mentioned code:

import subprocess
import multiprocessing as mp
from tqdm import tqdm

NUMBER_OF_TASKS = 4
progress_bar = tqdm(total=NUMBER_OF_TASKS)

def work(sec_sleep):
    command = ['python', 'worker.py', sec_sleep]
    subprocess.call(command)


def update_progress_bar(_):
    progress_bar.update()


if __name__ == '__main__':
    pool = mp.Pool(NUMBER_OF_TASKS)

    for seconds in [str(x) for x in range(1, NUMBER_OF_TASKS + 1)]:
         pool.apply_async(work, (seconds,), callback=update_progress_bar)

    pool.close()
    pool.join()

The tqdm progress bar does not work too well with multiprocessing and should be used instead with multithreading. But that is just as well since the subprocess.call method is already creating a new process so you will be using multiprocessing anyway. And had tqdm worked well with multiprocessing and your platform was one that used the spawn method to create new processes (such as Windows), then having the creation of the progress bar outside of the if __name__ = '__main__': block would have resulted in the creation of 4 additional progress bars that did nothing. Not good!

import subprocess
from multiprocessing.pool import ThreadPool
from tqdm import tqdm


def work(sec_sleep):
    command = ['python', 'worker.py', sec_sleep]
    subprocess.call(command)


def update_progress_bar(_):
    progress_bar.update()


if __name__ == '__main__':
    NUMBER_OF_TASKS = 4

    progress_bar = tqdm(total=NUMBER_OF_TASKS)

    pool = ThreadPool(NUMBER_OF_TASKS)

    for seconds in [str(x) for x in range(1, NUMBER_OF_TASKS + 1)]:
         pool.apply_async(work, (seconds,), callback=update_progress_bar)

    pool.close()
    pool.join()

Note: If your worker.py program prints to the console, it will mess up the progress bar.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM