[英]How to run multiple python scripts simultaneously from a wrapper script in such a way that CPU utilization is maximized?
我必須每天運行大約 200-300 個 python 腳本,這些腳本具有不同的 arguments,例如:
python scripts/foo.py -a bla -b blabla ..
python scripts/foo.py -a lol -b lolol ..
....
假設我已經為列表中存在的每個腳本提供了所有這些 arguments,並且我想同時執行它們以使 CPU 始終處於繁忙狀態。 我怎么能這樣做呢?
我目前的解決方案:
運行多個進程的腳本:
workers = 15
for i in range(0,len(jobs),workers):
job_string = ""
for j in range(i,min(i+workers,len(jobs))):
job_string += jobs[j] + " & "
if len(job_string) == 0:
continue
print(job_string)
val = subprocess.check_call("./scripts/parallelProcessing.sh '%s'" % job_string,shell=True)
scripts/parallelProcessing.sh(在上面的腳本中使用)
echo $1
echo "running scripts in parallel"
eval $1
wait
echo "done processing"
退稅:
我正在批量執行 K 個進程,然后再執行另一個 K 進程,依此類推。 但是隨着正在運行的進程數量不斷減少,CPU 核心利用率要低得多,最終一次只有一個進程在運行(對於給定的批次)。 因此,完成所有過程所花費的時間非常長。
一個簡單的解決方案是確保 K 個進程始終在運行,即一旦前一個進程完成,就必須安排一個新進程。 但我不確定如何實施這樣的解決方案。
期望:
由於任務對延遲不是很敏感,我期待一個簡單的解決方案,讓 CPU 大部分時間保持忙碌。
注意:這些進程中的任何兩個都可以同時執行而不會出現任何並發問題。 運行這些進程的主機有 python2。
這是我為使用subprocess.Popen
調用許多外部程序而開發的技術。 在此示例中,我調用convert
從 DICOM 文件生成 JPEG 圖像。
簡而言之; 它使用manageprocs
來不斷檢查正在運行的子進程列表。 如果一個已完成,則將其刪除並啟動一個新的,只要未處理的文件仍然存在。 之后,監視剩余的進程,直到它們全部完成。
from datetime import datetime
from functools import partial
import argparse
import logging
import os
import subprocess as sp
import sys
import time
def main():
"""
Entry point for dicom2jpg.
"""
args = setup()
if not args.fn:
logging.error("no files to process")
sys.exit(1)
if args.quality != 80:
logging.info(f"quality set to {args.quality}")
if args.level:
logging.info("applying level correction.")
start_partial = partial(start_conversion, quality=args.quality, level=args.level)
starttime = str(datetime.now())[:-7]
logging.info(f"started at {starttime}.")
# List of subprocesses
procs = []
# Do not launch more processes concurrently than your CPU has cores.
# That will only lead to the processes fighting over CPU resources.
maxprocs = os.cpu_count()
# Launch and mange subprocesses for all files.
for path in args.fn:
while len(procs) == maxprocs:
manageprocs(procs)
procs.append(start_partial(path))
# Wait for all subprocesses to finish.
while len(procs) > 0:
manageprocs(procs)
endtime = str(datetime.now())[:-7]
logging.info(f"completed at {endtime}.")
def start_conversion(filename, quality, level):
"""
Convert a DICOM file to a JPEG file.
Removing the blank areas from the Philips detector.
Arguments:
filename: name of the file to convert.
quality: JPEG quality to apply
level: Boolean to indicate whether level adustment should be done.
Returns:
Tuple of (input filename, output filename, subprocess.Popen)
"""
outname = filename.strip() + ".jpg"
size = "1574x2048"
args = [
"convert",
filename,
"-units",
"PixelsPerInch",
"-density",
"300",
"-depth",
"8",
"-crop",
size + "+232+0",
"-page",
size + "+0+0",
"-auto-gamma",
"-quality",
str(quality),
]
if level:
args += ["-level", "-35%,70%,0.5"]
args.append(outname)
proc = sp.Popen(args, stdout=sp.DEVNULL, stderr=sp.DEVNULL)
return (filename, outname, proc)
def manageprocs(proclist):
"""Check a list of subprocesses for processes that have ended and
remove them from the list.
Arguments:
proclist: List of tuples. The last item in the tuple must be
a subprocess.Popen object.
"""
for item in proclist:
filename, outname, proc = item
if proc.poll() is not None:
logging.info(f"conversion of “{filename}” to “{outname}” finished.")
proclist.remove(item)
# since manageprocs is called from a loop, keep CPU usage down.
time.sleep(0.05)
if __name__ == "__main__":
main()
我遺漏了setup()
; 它使用argparse
來處理命令行 arguments。
這里要處理的只是一個文件名列表。 但它也可能是(在您的情況下)腳本名稱和 arguments 的元組列表。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.