繁体   English   中英

在迭代列表中项目的python脚本中使用slurm创建作业

[英]Create jobs with slurm within python script that iterates over items in a list

背景:
我写了一个python脚本来将文件从格式转换为另一个格式。 此代码使用文本文件( subject_list.txt )作为输入,并遍历该文本文件中列出的源目录名称(数百个目录,每个目录包含数千个文件),转换其内容并将其存储在指定的输出目录中。

问题:
为了节省时间,我想在高性能群集(HPC)上使用此脚本,并创建作业以并行转换文件,而不是依次遍历列表中的每个目录。

我对python和HPC都是新手。 我们的实验室以前主要是用BASH编写的,没有访问HPC环境的权限,但是最近我们获得了对HPC的访问权限,并且决定切换到Python,因此一切都是新的。

题:
python中是否有一个模块可以让我在python脚本中创建作业? 我已经找到了有关多处理子过程 python模块的文档,但是我不清楚如何使用它们。 还是我应该采取其他方法? 我在这里也阅读了许多关于stackoverflow的文章,关于一起使用slurm和python,但是我受够了太多的信息而又没有足够的知识来区分要选择哪个线程。 任何帮助是极大的赞赏。

环境:
HPC:红帽企业Linux服务器版本7.4(Maipo)
python3 / 3.6.1
口吃17.11.2

管家部分代码:

# Change this for your study
group="labname"
study="studyname"

# Set paths
archivedir="/projects" + group + "/archive"
sourcedir="/projects/" + group + "shared/DICOMS/" + study
niidir="/projects/" + group + "/shared/" + study + archivedir + "/clean_niftis"
outputlog=niidir + "/outputlog_convert.txt"
errorlog=niidir + "/errorlog_convert.txt"
dcm2niix="/projects/" + group + "/shared/dcm2niix/build/bin/dcm2niix"

# Source the subject list (needs to be in your current working directory)
subjectlist="subject_list.txt" 

# Check/create the log files
def touch(path): # make a function: 
    with open(path, 'a'): # open it in append mode, but don't do anything to it yet
        os.utime(path, None) # make the file

if not os.path.isfile(outputlog): # if the file does not exist...
    touch(outputlog)
if not os.path.isfile(errorlog):
    touch(errorlog)

我停留在:

with open(subjectlist) as file:
    lines = file.readlines() 

for line in lines:
    subject=line.strip()
    subjectpath=sourcedir+"/"+subject
    if os.path.isdir(subjectpath):
        with open(outputlog, 'a') as logfile:
            logfile.write(subject+os.linesep)

        # Submit a job to the HPC with sbatch. This next line was not in the 
        # original script that works, and it isn't correct, but it captures
        # the gist of what I am trying to do (written in bash).
        sbatch --job-name dcm2nii_"${subject}" --partition=short --time 00:60:00 --mem-per-cpu=2G --cpus-per-task=1 -o "${niidir}"/"${subject}"_dcm2nii_output.txt -e "${niidir}"/"${subject}"_dcm2nii_error.txt 

        # This is what I want the job to do for the files in each directory:
        subprocess.call([dcm2niix, "-o", "-b y",  niidir, subjectpath])

    else:
        with open(errorlog, 'a') as logfile:
            logfile.write(subject+os.linesep)

编辑1:
dcm2niix是用于转换的软件,在HPC上可用。 它采用以下标志和路径-o -by ouputDirectory sourceDirectory

编辑2(解决方案):

with open(subjectlist) as file:
    lines = file.readlines() # set variable name to file and read the lines from the file
for line in lines:
    subject=line.strip()
    subjectpath=dicomdir+"/"+subject
    if os.path.isdir(subjectpath):
        with open(outputlog, 'a') as logfile:
            logfile.write(subject+os.linesep)
        # Create a job to submit to the HPC with sbatch 
        batch_cmd = 'sbatch --job-name dcm2nii_{subject} --partition=short --time 00:60:00 --mem-per-cpu=2G --cpus-per-task=1 -o {niidir}/{subject}_dcm2nii_output.txt -e {niidir}/{subject}_dcm2nii_error.txt --wrap="/projects/{group}/shared/dcm2niix/build/bin/dcm2niix -o {niidir} {subjectpath}"'.format(subject=subject,niidir=niidir,subjectpath=subjectpath,group=group)
        # Submit the job
        subprocess.call([batch_cmd], shell=True)
    else:
        with open(errorlog, 'a') as logfile:
            logfile.write(subject+os.linesep)

这是您的代码可能的解决方案。 尚未测试。

with open(subjectlist) as file:
    lines = file.readlines() 

for line in lines:
    subject=line.strip()
    subjectpath=sourcedir+"/"+subject
    if os.path.isdir(subjectpath):
        with open(outputlog, 'a') as logfile:
            logfile.write(subject+os.linesep)

        # Submit a job to the HPC with sbatch. This next line was not in the 
        # original script that works, and it isn't correct, but it captures
        # the gist of what I am trying to do (written in bash).
        cmd = 'sbatch --job-name dcm2nii_{subject} --partition=short --time 00:60:00\
        --mem-per-cpu=2G --cpus-per-task=1 -o {niidir}/{subject}_dcm2nii_output.txt\
        -e {niidir}/{subject}_dcm2nii_error.txt\
        --wrap="dcm2niix -o -b y {niidir} {subjectpath}"'.format(subject=subject,niidir=,subjectpath=subjectpath)

        # This is what I want the job to do for the files in each directory:
        subprocess.call([cmd], shell=True)

    else:
        with open(errorlog, 'a') as logfile:
            logfile.write(subject+os.linesep)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM