简体   繁体   中英

Python, running command line tools in parallel

I am using Python as a script language to do some data processing and call command-line tools for number crunching. I wish to run command-line tools in parallel since they are independent with each other. When one command-line tool is finished, I can collect its results from the output file. So I also need some synchronization mechanism to notify my main Python program that one task is finished so that the result could be parsed into my main program.

Currently, I use os.system() , which works fine for one-thread, but cannot be parallelized.

Thanks!

If you want to run commandline tools as separate processes, just use os.system (or better: The subprocess module) to start them asynchronously. On Unix/linux/macos:

subprocess.call("command -flags arguments &", shell=True)

On Windows:

subprocess.call("start command -flags arguments", shell=True)

As for knowing when a command has finished: Under unix you could get set up with wait etc., but if you're writing the commandline scripts, I'd just have them write a message into a file, and monitor the file from the calling python script.

@James Youngman proposed a solution to your second question: Synchronization. If you want to control your processes from python, you could start them asynchronously with Popen.

p1 = subprocess.Popen("command1 -flags arguments")
p2 = subprocess.Popen("command2 -flags arguments")

Beware that if you use Popen and your processes write a lot of data to stdout, your program will deadlock. Be sure to redirect all output to a log file.

p1 and p2 are objects that you can use to keep tabs on your processes. p1.poll() will not block, but will return None if the process is still running. It will return the exit status when it is done, so you can check if it is zero.

while True:
    time.sleep(60)
    for proc in [p1, p2]:
        status = proc.poll()
        if status == None:
            continue
        elif status == 0:
            # harvest the answers
        else:
            print "command1 failed with status", status

The above is just a model: As written, it will never exit, and it will keep "harvesting" the results of completed processes. But I trust you get the idea.

Use the Pool object from the multiprocessing module. You can then use eg Pool.map() to do parallel processing. An example would be my markphotos script (see below), where a function is called multiple times in parallel to each process a picture.

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Adds my copyright notice to photos.
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# $Date: 2012-10-28 17:00:24 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to markphotos.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

import sys
import subprocess
from multiprocessing import Pool, Lock
from os import utime, devnull
import os.path
from time import mktime

globallock = Lock() 

def processfile(name):
    """Adds copyright notice to the file.

    Arguments:
    name -- file to modify
    """
    args = ['exiftool', '-CreateDate', name]
    createdate = subprocess.check_output(args)
    fields = createdate.split(":") #pylint: disable=E1103
    year = int(fields[1])
    cr = "R.F. Smith <rsmith@xs4all.nl> http://rsmith.home.xs4all.nl/"
    cmt = "Copyright © {} {}".format(year, cr)
    args = ['exiftool', '-Copyright="Copyright (C) {} {}"'.format(year, cr),
            '-Comment="{}"'.format(cmt), '-overwrite_original', '-q', name]
    rv = subprocess.call(args)
    modtime = int(mktime((year, int(fields[2]), int(fields[3][:2]),
                          int(fields[3][3:]), int(fields[4]), int(fields[5]),
                          0,0,-1)))
    utime(name, (modtime, modtime))
    globallock.acquire()
    if rv == 0:
        print "File '{}' processed.".format(name)
    else:
        print "Error when processing file '{}'".format(name)
    globallock.release()

def checkfor(args):
    """Make sure that a program necessary for using this script is
    available.

    Arguments:
    args -- list of commands to pass to subprocess.call.
    """
    if isinstance(args, str):
        args = args.split()
    try:
        with open(devnull, 'w') as f:
            subprocess.call(args, stderr=subprocess.STDOUT, stdout=f)
    except:
        print "Required program '{}' not found! exiting.".format(args[0])
        sys.exit(1)

def main(argv):
    """Main program.

    Arguments:
    argv -- command line arguments
    """
    if len(argv) == 1:
        binary = os.path.basename(argv[0])
        print "Usage: {} [file ...]".format(binary)
        sys.exit(0)
    checkfor(['exiftool',  '-ver'])
    p = Pool()
    p.map(processfile, argv[1:])
    p.close()

if __name__ == '__main__':
    main(sys.argv)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM