[英]How to run multiple python scripts simultaneously from a wrapper script in such a way that CPU utilization is maximized?
我必须每天运行大约 200-300 个 python 脚本,这些脚本具有不同的 arguments,例如:
python scripts/foo.py -a bla -b blabla ..
python scripts/foo.py -a lol -b lolol ..
....
假设我已经为列表中存在的每个脚本提供了所有这些 arguments,并且我想同时执行它们以使 CPU 始终处于繁忙状态。 我怎么能这样做呢?
我目前的解决方案:
运行多个进程的脚本:
workers = 15
for i in range(0,len(jobs),workers):
job_string = ""
for j in range(i,min(i+workers,len(jobs))):
job_string += jobs[j] + " & "
if len(job_string) == 0:
continue
print(job_string)
val = subprocess.check_call("./scripts/parallelProcessing.sh '%s'" % job_string,shell=True)
scripts/parallelProcessing.sh(在上面的脚本中使用)
echo $1
echo "running scripts in parallel"
eval $1
wait
echo "done processing"
退税:
我正在批量执行 K 个进程,然后再执行另一个 K 进程,依此类推。 但是随着正在运行的进程数量不断减少,CPU 核心利用率要低得多,最终一次只有一个进程在运行(对于给定的批次)。 因此,完成所有过程所花费的时间非常长。
一个简单的解决方案是确保 K 个进程始终在运行,即一旦前一个进程完成,就必须安排一个新进程。 但我不确定如何实施这样的解决方案。
期望:
由于任务对延迟不是很敏感,我期待一个简单的解决方案,让 CPU 大部分时间保持忙碌。
注意:这些进程中的任何两个都可以同时执行而不会出现任何并发问题。 运行这些进程的主机有 python2。
这是我为使用subprocess.Popen
调用许多外部程序而开发的技术。 在此示例中,我调用convert
从 DICOM 文件生成 JPEG 图像。
简而言之; 它使用manageprocs
来不断检查正在运行的子进程列表。 如果一个已完成,则将其删除并启动一个新的,只要未处理的文件仍然存在。 之后,监视剩余的进程,直到它们全部完成。
from datetime import datetime
from functools import partial
import argparse
import logging
import os
import subprocess as sp
import sys
import time
def main():
"""
Entry point for dicom2jpg.
"""
args = setup()
if not args.fn:
logging.error("no files to process")
sys.exit(1)
if args.quality != 80:
logging.info(f"quality set to {args.quality}")
if args.level:
logging.info("applying level correction.")
start_partial = partial(start_conversion, quality=args.quality, level=args.level)
starttime = str(datetime.now())[:-7]
logging.info(f"started at {starttime}.")
# List of subprocesses
procs = []
# Do not launch more processes concurrently than your CPU has cores.
# That will only lead to the processes fighting over CPU resources.
maxprocs = os.cpu_count()
# Launch and mange subprocesses for all files.
for path in args.fn:
while len(procs) == maxprocs:
manageprocs(procs)
procs.append(start_partial(path))
# Wait for all subprocesses to finish.
while len(procs) > 0:
manageprocs(procs)
endtime = str(datetime.now())[:-7]
logging.info(f"completed at {endtime}.")
def start_conversion(filename, quality, level):
"""
Convert a DICOM file to a JPEG file.
Removing the blank areas from the Philips detector.
Arguments:
filename: name of the file to convert.
quality: JPEG quality to apply
level: Boolean to indicate whether level adustment should be done.
Returns:
Tuple of (input filename, output filename, subprocess.Popen)
"""
outname = filename.strip() + ".jpg"
size = "1574x2048"
args = [
"convert",
filename,
"-units",
"PixelsPerInch",
"-density",
"300",
"-depth",
"8",
"-crop",
size + "+232+0",
"-page",
size + "+0+0",
"-auto-gamma",
"-quality",
str(quality),
]
if level:
args += ["-level", "-35%,70%,0.5"]
args.append(outname)
proc = sp.Popen(args, stdout=sp.DEVNULL, stderr=sp.DEVNULL)
return (filename, outname, proc)
def manageprocs(proclist):
"""Check a list of subprocesses for processes that have ended and
remove them from the list.
Arguments:
proclist: List of tuples. The last item in the tuple must be
a subprocess.Popen object.
"""
for item in proclist:
filename, outname, proc = item
if proc.poll() is not None:
logging.info(f"conversion of “{filename}” to “{outname}” finished.")
proclist.remove(item)
# since manageprocs is called from a loop, keep CPU usage down.
time.sleep(0.05)
if __name__ == "__main__":
main()
我遗漏了setup()
; 它使用argparse
来处理命令行 arguments。
这里要处理的只是一个文件名列表。 但它也可能是(在您的情况下)脚本名称和 arguments 的元组列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.