简体   繁体   English

从strigger开始的作业中获取SLURM作业ID

[英]Get SLURM job ID from job started by strigger

I have an R analysis composed of three parts ( partA , partB , and partC ). 我有一个由三个部分( partApartBpartC )组成的R分析。 I submit each part to SLURM (eg sbatch partA ), and each part is parallelized via #SBATCH --array=1-1500 . 我将每个部分提交给SLURM(例如sbatch partA ),并且每个部分都通过#SBATCH --array=1-1500并行#SBATCH --array=1-1500 The parts are in serial, so I need to wait for one to finish before starting the next. 这些零件是串行的,因此我需要等待一个零件完成才能开始下一个零件。 Right now I'm manually starting each job, but that's not a great solution. 现在,我正在手动开始每项工作,但这不是一个很好的解决方案。

I would like to automate the three sbatch calls. 我想自动执行三个sbatch调用。 For example: 例如:

  1. sbatch partA
  2. when partA is done, sbatch partB partA完成后, sbatch partB
  3. when partB is done, sbatch partC partB完成后,对sbatch partC

I used this solution to get the job ID of partA , and pass that to strigger to accomplish step 2 above. 我使用此解决方案获取partA的作业ID,并将其传递给strigger以完成上述步骤2。 However I'm stuck at that point, because I don't know how to get the job ID of partB from strigger . 但是我被困在这一点上,因为我不知道如何从strigger获取partB的工作ID Here's what my code looks like: 这是我的代码:

#!/bin/bash

# step 1: sbatch partA
partA_ID=$(sbatch --parsable partA.sh)

# step 2: sbatch partB
strigger --set --jobid=$partA_ID --fini --program=/path/to/partB.batch

# step 3: sbatch partC
... ?

How do I complete step 3? 如何完成步骤3?

strigger is not the proper tool to achieve that goal, it is more aimed at administrators than regular users. strigger不是实现该目标的合适工具,它比常规用户更适合管理员。 Only slurm user can actually set triggers (see the "Important note" in the strigger manpage ). 只有slurm user可以实际设置触发器(请参见strigger联机帮助页中的“重要说明”)。

In your case, you should submit all three jobs at once, with dependencies set among them. 对于您的情况,您应该一次提交所有三个作业,并在它们之间设置依赖性。

For instance: 例如:

$ partA_ID=$(sbatch --parsable partA.sh)
$ partB_ID=$(sbatch --parsable --dependency=afterany:${partA_ID} partB.sh)
$ partC_ID=$(sbatch --parsable --dependency=afterany:${partB_ID} partC.sh)

This will submit three job arrays but the second one will only start when all jobs in the first one have finished. 这将提交三个作业数组,但是第二个作业数组仅在第一个作业中的所有作业都已完成时才开始。 And the third one will only start when all jobs in the second one have finished. 并且只有在第二个作业完成后,第三个作业才会开始。

An alternative can be 可以选择

$ partA_ID=$(sbatch --parsable partA.sh)
$ partB_ID=$(sbatch --parsable --dependency=aftercorr:${partA_ID}  partB.sh)
$ partC_ID=$(sbatch --parsable --dependency=aftercorr:${partB_ID}  partC.sh)

This will submit three job arrays, but the all jobs of the second one will not start until the corresponding job in the first one (ie job that has the same $SLURM_ARRAY_TASK_ID ) has finished. 这将提交三个作业数组,但是第二个作业数组的所有作业将在第一个作业中的相应作业(即具有相同$SLURM_ARRAY_TASK_ID )完成之前开始。 And all jobs in the third one will start only when the corresponding job in the second one have finished. 并且只有在第二个作业中的相应作业完成后,第三个作业中的所有作业才会开始。

For more details, see the --dependency section in the sbatch manpage . 有关详细信息,请参阅--dependency在部分sbatch手册页

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM