简体   繁体   English

如何以最大化 CPU 使用率的方式从包装器脚本同时运行多个 python 脚本?

[英]How to run multiple python scripts simultaneously from a wrapper script in such a way that CPU utilization is maximized?

I have to run about 200-300 python scripts daily having different arguments, for example:我必须每天运行大约 200-300 个 python 脚本,这些脚本具有不同的 arguments,例如:

python scripts/foo.py -a bla -b blabla ..
python scripts/foo.py -a lol -b lolol ..
....

Lets say I already have all these arguments for every script present inside a list, and I would like to concurrently execute them such that the CPU is always busy.假设我已经为列表中存在的每个脚本提供了所有这些 arguments,并且我想同时执行它们以使 CPU 始终处于繁忙状态。 How can I do so?'我怎么能这样做呢?

My current solution:我目前的解决方案:

script for running multiple processes:运行多个进程的脚本:

    workers = 15
    for i in range(0,len(jobs),workers):
        job_string = ""
        for j in range(i,min(i+workers,len(jobs))):
            job_string += jobs[j] + " & "
        if len(job_string) == 0:
            continue
        print(job_string)
        val = subprocess.check_call("./scripts/parallelProcessing.sh '%s'" % job_string,shell=True)

scripts/parallelProcessing.sh (used in the above script) scripts/parallelProcessing.sh(在上面的脚本中使用)

echo $1
echo "running scripts in parallel"
eval $1
wait
echo "done processing"

Drawback:退税:

I am executing K processes in a batch, and then another K and so on.我正在批量执行 K 个进程,然后再执行另一个 K 进程,依此类推。 But CPU cores utilization is much lower as the number of running processes keep reducing, and eventually only one process is running at a time (for a given batch).但是随着正在运行的进程数量不断减少,CPU 核心利用率要低得多,最终一次只有一个进程在运行(对于给定的批次)。 As a result, the time taken to complete all the processes is significant.因此,完成所有过程所花费的时间非常长。

One simple solution is to ensure K processes are always running, ie once the previous process gets completed, a new one must be scheduled.一个简单的解决方案是确保 K 个进程始终在运行,即一旦前一个进程完成,就必须安排一个新进程。 But I am not sure how to implement such a solution.但我不确定如何实施这样的解决方案。

Expectations:期望:

As the task is not very latency sensitive, I am looking forward to a simple solution which keeps CPU mostly busy.由于任务对延迟不是很敏感,我期待一个简单的解决方案,让 CPU 大部分时间保持忙碌。

Note: Any two of those processes can execute simultaneously without any concurrency issues.注意:这些进程中的任何两个都可以同时执行而不会出现任何并发问题。 The host where these processes run has python2.运行这些进程的主机有 python2。

This is a technique I developed for calling many external programs using subprocess.Popen .这是我为使用subprocess.Popen调用许多外部程序而开发的技术。 In this example, I'm calling convert make JPEG images from DICOM files.在此示例中,我调用convert从 DICOM 文件生成 JPEG 图像。

In short;简而言之; it uses manageprocs to keep checking a list of running subprocesses.它使用manageprocs来不断检查正在运行的子进程列表。 If one is finished, it is removed and a new one is started as long as unprocesses files remain.如果一个已完成,则将其删除并启动一个新的,只要未处理的文件仍然存在。 After that, the remaining processes are watched until they are all finished.之后,监视剩余的进程,直到它们全部完成。

from datetime import datetime
from functools import partial
import argparse
import logging
import os
import subprocess as sp
import sys
import time


def main():
    """
    Entry point for dicom2jpg.
    """
    args = setup()
    if not args.fn:
        logging.error("no files to process")
        sys.exit(1)
    if args.quality != 80:
        logging.info(f"quality set to {args.quality}")
    if args.level:
        logging.info("applying level correction.")
    start_partial = partial(start_conversion, quality=args.quality, level=args.level)

    starttime = str(datetime.now())[:-7]
    logging.info(f"started at {starttime}.")
    # List of subprocesses
    procs = []
    # Do not launch more processes concurrently than your CPU has cores.
    # That will only lead to the processes fighting over CPU resources.
    maxprocs = os.cpu_count()
    # Launch and mange subprocesses for all files.
    for path in args.fn:
        while len(procs) == maxprocs:
            manageprocs(procs)
        procs.append(start_partial(path))
    # Wait for all subprocesses to finish.
    while len(procs) > 0:
        manageprocs(procs)
    endtime = str(datetime.now())[:-7]
    logging.info(f"completed at {endtime}.")


def start_conversion(filename, quality, level):
    """
    Convert a DICOM file to a JPEG file.

    Removing the blank areas from the Philips detector.

    Arguments:
        filename: name of the file to convert.
        quality: JPEG quality to apply
        level: Boolean to indicate whether level adustment should be done.
    Returns:
        Tuple of (input filename, output filename, subprocess.Popen)
    """
    outname = filename.strip() + ".jpg"
    size = "1574x2048"
    args = [
        "convert",
        filename,
        "-units",
        "PixelsPerInch",
        "-density",
        "300",
        "-depth",
        "8",
        "-crop",
        size + "+232+0",
        "-page",
        size + "+0+0",
        "-auto-gamma",
        "-quality",
        str(quality),
    ]
    if level:
        args += ["-level", "-35%,70%,0.5"]
    args.append(outname)
    proc = sp.Popen(args, stdout=sp.DEVNULL, stderr=sp.DEVNULL)
    return (filename, outname, proc)


def manageprocs(proclist):
    """Check a list of subprocesses for processes that have ended and
    remove them from the list.

    Arguments:
        proclist: List of tuples. The last item in the tuple must be
                  a subprocess.Popen object.
    """
    for item in proclist:
        filename, outname, proc = item
        if proc.poll() is not None:
            logging.info(f"conversion of “{filename}” to “{outname}” finished.")
            proclist.remove(item)
    # since manageprocs is called from a loop, keep CPU usage down.
    time.sleep(0.05)


if __name__ == "__main__":
    main()

I've left out setup() ;我遗漏了setup() it's using argparse to deal with command-line arguments.它使用argparse来处理命令行 arguments。

Here the thing to be processed is just a list of file names.这里要处理的只是一个文件名列表。 But it could also be (in your case) a list of tuples of script names and arguments.但它也可能是(在您的情况下)脚本名称和 arguments 的元组列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM