简体   繁体   中英

Get SLURM job ID from job started by strigger

I have an R analysis composed of three parts ( partA , partB , and partC ). I submit each part to SLURM (eg sbatch partA ), and each part is parallelized via #SBATCH --array=1-1500 . The parts are in serial, so I need to wait for one to finish before starting the next. Right now I'm manually starting each job, but that's not a great solution.

I would like to automate the three sbatch calls. For example:

  1. sbatch partA
  2. when partA is done, sbatch partB
  3. when partB is done, sbatch partC

I used this solution to get the job ID of partA , and pass that to strigger to accomplish step 2 above. However I'm stuck at that point, because I don't know how to get the job ID of partB from strigger . Here's what my code looks like:

#!/bin/bash

# step 1: sbatch partA
partA_ID=$(sbatch --parsable partA.sh)

# step 2: sbatch partB
strigger --set --jobid=$partA_ID --fini --program=/path/to/partB.batch

# step 3: sbatch partC
... ?

How do I complete step 3?

strigger is not the proper tool to achieve that goal, it is more aimed at administrators than regular users. Only slurm user can actually set triggers (see the "Important note" in the strigger manpage ).

In your case, you should submit all three jobs at once, with dependencies set among them.

For instance:

$ partA_ID=$(sbatch --parsable partA.sh)
$ partB_ID=$(sbatch --parsable --dependency=afterany:${partA_ID} partB.sh)
$ partC_ID=$(sbatch --parsable --dependency=afterany:${partB_ID} partC.sh)

This will submit three job arrays but the second one will only start when all jobs in the first one have finished. And the third one will only start when all jobs in the second one have finished.

An alternative can be

$ partA_ID=$(sbatch --parsable partA.sh)
$ partB_ID=$(sbatch --parsable --dependency=aftercorr:${partA_ID}  partB.sh)
$ partC_ID=$(sbatch --parsable --dependency=aftercorr:${partB_ID}  partC.sh)

This will submit three job arrays, but the all jobs of the second one will not start until the corresponding job in the first one (ie job that has the same $SLURM_ARRAY_TASK_ID ) has finished. And all jobs in the third one will start only when the corresponding job in the second one have finished.

For more details, see the --dependency section in the sbatch manpage .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM