简体   繁体   English

Python,并行运行命令行工具

[英]Python, running command line tools in parallel

I am using Python as a script language to do some data processing and call command-line tools for number crunching. 我使用Python作为脚本语言来进行数据处理并调用命令行工具进行数字运算。 I wish to run command-line tools in parallel since they are independent with each other. 我希望并行运行命令行工具,因为它们彼此独立。 When one command-line tool is finished, I can collect its results from the output file. 当一个命令行工具完成后,我可以从输出文件中收集其结果。 So I also need some synchronization mechanism to notify my main Python program that one task is finished so that the result could be parsed into my main program. 所以我还需要一些同步机制来通知我的主Python程序一个任务完成,以便结果可以解析到我的主程序中。

Currently, I use os.system() , which works fine for one-thread, but cannot be parallelized. 目前,我使用os.system() ,它适用于单线程,但不能并行化。

Thanks! 谢谢!

If you want to run commandline tools as separate processes, just use os.system (or better: The subprocess module) to start them asynchronously. 如果要将命令行工具作为单独的进程运行,只需使用os.system (或更好: subprocess进程模块)以异步方式启动它们。 On Unix/linux/macos: 在Unix / linux / macos上:

subprocess.call("command -flags arguments &", shell=True)

On Windows: 在Windows上:

subprocess.call("start command -flags arguments", shell=True)

As for knowing when a command has finished: Under unix you could get set up with wait etc., but if you're writing the commandline scripts, I'd just have them write a message into a file, and monitor the file from the calling python script. 至于知道命令何时完成:在unix下你可以设置wait等等,但是如果你正在编写命令行脚本,我只是让他们在一个文件中写一条消息,并监视来自调用python脚本。

@James Youngman proposed a solution to your second question: Synchronization. @James Youngman提出了第二个问题的解决方案:同步。 If you want to control your processes from python, you could start them asynchronously with Popen. 如果你想从python控制你的进程,你可以用Popen异步启动它们。

p1 = subprocess.Popen("command1 -flags arguments")
p2 = subprocess.Popen("command2 -flags arguments")

Beware that if you use Popen and your processes write a lot of data to stdout, your program will deadlock. 请注意,如果您使用Popen并且您的进程将大量数据写入stdout,您的程序将会死锁。 Be sure to redirect all output to a log file. 务必将所有输出重定向到日志文件。

p1 and p2 are objects that you can use to keep tabs on your processes. p1p2是可用于密切关注进程的对象。 p1.poll() will not block, but will return None if the process is still running. p1.poll()不会阻塞,但如果进程仍在运行,则返回None。 It will return the exit status when it is done, so you can check if it is zero. 完成后它将返回退出状态,因此您可以检查它是否为零。

while True:
    time.sleep(60)
    for proc in [p1, p2]:
        status = proc.poll()
        if status == None:
            continue
        elif status == 0:
            # harvest the answers
        else:
            print "command1 failed with status", status

The above is just a model: As written, it will never exit, and it will keep "harvesting" the results of completed processes. 以上只是一个模型:如上所述,它永远不会退出,它将继续“收获”已完成流程的结果。 But I trust you get the idea. 但我相信你明白了。

Use the Pool object from the multiprocessing module. 使用multiprocessing模块中的Pool对象。 You can then use eg Pool.map() to do parallel processing. 然后,您可以使用例如Pool.map()进行并行处理。 An example would be my markphotos script (see below), where a function is called multiple times in parallel to each process a picture. 一个例子是我的markphotos脚本(见下文),其中一个函数与图片的每个处理并行调用多次。

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Adds my copyright notice to photos.
#
# Author: R.F. Smith <rsmith@xs4all.nl>
# $Date: 2012-10-28 17:00:24 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to markphotos.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

import sys
import subprocess
from multiprocessing import Pool, Lock
from os import utime, devnull
import os.path
from time import mktime

globallock = Lock() 

def processfile(name):
    """Adds copyright notice to the file.

    Arguments:
    name -- file to modify
    """
    args = ['exiftool', '-CreateDate', name]
    createdate = subprocess.check_output(args)
    fields = createdate.split(":") #pylint: disable=E1103
    year = int(fields[1])
    cr = "R.F. Smith <rsmith@xs4all.nl> http://rsmith.home.xs4all.nl/"
    cmt = "Copyright © {} {}".format(year, cr)
    args = ['exiftool', '-Copyright="Copyright (C) {} {}"'.format(year, cr),
            '-Comment="{}"'.format(cmt), '-overwrite_original', '-q', name]
    rv = subprocess.call(args)
    modtime = int(mktime((year, int(fields[2]), int(fields[3][:2]),
                          int(fields[3][3:]), int(fields[4]), int(fields[5]),
                          0,0,-1)))
    utime(name, (modtime, modtime))
    globallock.acquire()
    if rv == 0:
        print "File '{}' processed.".format(name)
    else:
        print "Error when processing file '{}'".format(name)
    globallock.release()

def checkfor(args):
    """Make sure that a program necessary for using this script is
    available.

    Arguments:
    args -- list of commands to pass to subprocess.call.
    """
    if isinstance(args, str):
        args = args.split()
    try:
        with open(devnull, 'w') as f:
            subprocess.call(args, stderr=subprocess.STDOUT, stdout=f)
    except:
        print "Required program '{}' not found! exiting.".format(args[0])
        sys.exit(1)

def main(argv):
    """Main program.

    Arguments:
    argv -- command line arguments
    """
    if len(argv) == 1:
        binary = os.path.basename(argv[0])
        print "Usage: {} [file ...]".format(binary)
        sys.exit(0)
    checkfor(['exiftool',  '-ver'])
    p = Pool()
    p.map(processfile, argv[1:])
    p.close()

if __name__ == '__main__':
    main(sys.argv)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM