简体   繁体   English

限制 SLURM 中正在运行的作业数量

[英]Limit the number of running jobs in SLURM

I am queuing multiple jobs in SLURM.我在 SLURM 中排队多个作业。 Can I limit the number of parallel running jobs in slurm?我可以限制 slurm 中并行运行的作业数量吗?

Thanks in advance!提前致谢!

If you are not the administrator, your can hold some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID> , and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD .如果您不是管理员,您可以在不希望所有作业同时启动的情况下hold一些作业,使用scontrol hold <JOBID> ,并且您可以使用sbatch --begin=YYYY-MM-DD延迟一些作业的提交sbatch --begin=YYYY-MM-DD Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25 to have 100 jobs in the array but only 25 of them running.此外,如果它是作业阵列,您可以限制阵列中同时运行的作业数量,例如--array=1:100%25使阵列中有 100 个作业,但只有 25 个在运行。

According to the SLURM Resource Limits documentation , you can limit the total number of jobs that you can run for an association/qos with the MaxJobs parameter.根据 SLURM 资源限制文档,您可以使用MaxJobs参数限制可以为关联/qos 运行的作业总数。 As a reminder, an association is a combination of cluster, account, user name and (optional) partition name.提醒一下,关联是集群、帐户、用户名和(可选)分区名称的组合。

You should be able to do something similar to:你应该能够做类似的事情:

sacctmgr modify user <userid> account=<account_name> set MaxJobs=10

I found this presentation to be very helpful in case you have more questions.如果您有更多问题,我发现此演示文稿非常有用。

According to SLURM documentation , --array=0-15%4 ( - sign and not : ) will limit the number of simultaneously running tasks from this job array to 4SLURM文档--array=0-15%4-符号,而不是:)将限制的同时运行的任务的数量从该工作阵列4

I wrote test.sbatch :我写了test.sbatch

#!/bin/bash
# test.sbatch
#
#SBATCH -J a
#SBATCH -p campus
#SBATCH -c 1
#SBATCH -o %A_%a.output

mkdir test${SLURM_ARRAY_TASK_ID}

# sleep for up to 10 minutes to see them running in squeue and 
# different times to check that the number of parallel jobs remain constant
RANGE=600; number=$RANDOM; let "number %= $RANGE"; echo "$number"

sleep $number

and run it with sbatch --array=1-15%4 test.sbatch并使用sbatch --array=1-15%4 test.sbatch运行它

Jobs run as expected (always 4 in parallel) and just create directories and kept running for $number seconds.作业按预期运行(总是 4 个并行),只需创建目录并保持运行$number秒。

Appreciate comments and suggestions.欣赏评论和建议。

If your jobs are relatively similar you can use the slurm array functions.如果您的工作相对相似,您可以使用 slurm 数组函数。 I had been trying to figure this out for a while and found this solution at https://docs.id.unibe.ch/ubelix/job-management-with-slurm/array-jobs-with-slurm我一直试图解决这个问题,并在https://docs.id.unibe.ch/ubelix/job-management-with-slurm/array-jobs-with-slurm找到了这个解决方案

#!/bin/bash -x
#SBATCH --mail-type=NONE
#SBATCH --array=1-419%25  # Submit 419 tasks with with only 25 of them running at any time

#contains the list of 419 commands I want to run
cmd_file=s1List_170519.txt

cmd_line=$(cat $cmd_file | awk -v var=${SLURM_ARRAY_TASK_ID} 'NR==var {print $1}')    # Get first argument

$cmd_line  #may need to be piped to bash

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM