簡體   English   中英

如何以最大化 CPU 使用率的方式從包裝器腳本同時運行多個 python 腳本?

[英]How to run multiple python scripts simultaneously from a wrapper script in such a way that CPU utilization is maximized?

我必須每天運行大約 200-300 個 python 腳本,這些腳本具有不同的 arguments,例如:

python scripts/foo.py -a bla -b blabla ..
python scripts/foo.py -a lol -b lolol ..
....

假設我已經為列表中存在的每個腳本提供了所有這些 arguments,並且我想同時執行它們以使 CPU 始終處於繁忙狀態。 我怎么能這樣做呢?

我目前的解決方案:

運行多個進程的腳本:

    workers = 15
    for i in range(0,len(jobs),workers):
        job_string = ""
        for j in range(i,min(i+workers,len(jobs))):
            job_string += jobs[j] + " & "
        if len(job_string) == 0:
            continue
        print(job_string)
        val = subprocess.check_call("./scripts/parallelProcessing.sh '%s'" % job_string,shell=True)

scripts/parallelProcessing.sh(在上面的腳本中使用)

echo $1
echo "running scripts in parallel"
eval $1
wait
echo "done processing"

退稅:

我正在批量執行 K 個進程,然后再執行另一個 K 進程,依此類推。 但是隨着正在運行的進程數量不斷減少,CPU 核心利用率要低得多,最終一次只有一個進程在運行(對於給定的批次)。 因此,完成所有過程所花費的時間非常長。

一個簡單的解決方案是確保 K 個進程始終在運行,即一旦前一個進程完成,就必須安排一個新進程。 但我不確定如何實施這樣的解決方案。

期望:

由於任務對延遲不是很敏感,我期待一個簡單的解決方案,讓 CPU 大部分時間保持忙碌。

注意:這些進程中的任何兩個都可以同時執行而不會出現任何並發問題。 運行這些進程的主機有 python2。

這是我為使用subprocess.Popen調用許多外部程序而開發的技術。 在此示例中,我調用convert從 DICOM 文件生成 JPEG 圖像。

簡而言之; 它使用manageprocs來不斷檢查正在運行的子進程列表。 如果一個已完成,則將其刪除並啟動一個新的,只要未處理的文件仍然存在。 之后,監視剩余的進程,直到它們全部完成。

from datetime import datetime
from functools import partial
import argparse
import logging
import os
import subprocess as sp
import sys
import time


def main():
    """
    Entry point for dicom2jpg.
    """
    args = setup()
    if not args.fn:
        logging.error("no files to process")
        sys.exit(1)
    if args.quality != 80:
        logging.info(f"quality set to {args.quality}")
    if args.level:
        logging.info("applying level correction.")
    start_partial = partial(start_conversion, quality=args.quality, level=args.level)

    starttime = str(datetime.now())[:-7]
    logging.info(f"started at {starttime}.")
    # List of subprocesses
    procs = []
    # Do not launch more processes concurrently than your CPU has cores.
    # That will only lead to the processes fighting over CPU resources.
    maxprocs = os.cpu_count()
    # Launch and mange subprocesses for all files.
    for path in args.fn:
        while len(procs) == maxprocs:
            manageprocs(procs)
        procs.append(start_partial(path))
    # Wait for all subprocesses to finish.
    while len(procs) > 0:
        manageprocs(procs)
    endtime = str(datetime.now())[:-7]
    logging.info(f"completed at {endtime}.")


def start_conversion(filename, quality, level):
    """
    Convert a DICOM file to a JPEG file.

    Removing the blank areas from the Philips detector.

    Arguments:
        filename: name of the file to convert.
        quality: JPEG quality to apply
        level: Boolean to indicate whether level adustment should be done.
    Returns:
        Tuple of (input filename, output filename, subprocess.Popen)
    """
    outname = filename.strip() + ".jpg"
    size = "1574x2048"
    args = [
        "convert",
        filename,
        "-units",
        "PixelsPerInch",
        "-density",
        "300",
        "-depth",
        "8",
        "-crop",
        size + "+232+0",
        "-page",
        size + "+0+0",
        "-auto-gamma",
        "-quality",
        str(quality),
    ]
    if level:
        args += ["-level", "-35%,70%,0.5"]
    args.append(outname)
    proc = sp.Popen(args, stdout=sp.DEVNULL, stderr=sp.DEVNULL)
    return (filename, outname, proc)


def manageprocs(proclist):
    """Check a list of subprocesses for processes that have ended and
    remove them from the list.

    Arguments:
        proclist: List of tuples. The last item in the tuple must be
                  a subprocess.Popen object.
    """
    for item in proclist:
        filename, outname, proc = item
        if proc.poll() is not None:
            logging.info(f"conversion of “{filename}” to “{outname}” finished.")
            proclist.remove(item)
    # since manageprocs is called from a loop, keep CPU usage down.
    time.sleep(0.05)


if __name__ == "__main__":
    main()

我遺漏了setup() 它使用argparse來處理命令行 arguments。

這里要處理的只是一個文件名列表。 但它也可能是(在您的情況下)腳本名稱和 arguments 的元組列表。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM