After using the Slurm cluster manager to sbatch a job with multiple processes, is there a way to know the status (running or finishing) of each process? Can it be implemented in a python script?
If the processes you mention are distincts steps, then sacct
can give you the information as explained by @Christopher Bottoms.
But if the processes are different tasks in a single step, then you can use this script that uses parallel SSH to run 'ps' commands on the compute nodes and offer a summarised view, as @Tom de Geus suggests.
Just use the command sacct
that comes with Slurm.
Given this code ( my.sh
):
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=2
srun -n1 sleep 10 &
srun -n1 sleep 3
wait
I run it:
sbatch my.sh
And then check on it with sacct
:
sacct
Which gives me per-step info:
JobID JobName Partition Account AllocCPUS State ExitCode
---------- ---------- ---------- ---------- ---------- ---------- --------
8021 my.sbatch CLUSTER me 2 RUNNING 0:0
8021.0 sleep me 1 RUNNING 0:0
8021.1 sleep me 1 COMPLETED 0:0
sacct
has a lot of options to customize its output. For example,
sacct --format='JobID%6,State'
Will just give you the IDs (up to 6 characters) and the current state of jobs:
JobID State
------ ----------
8021 RUNNING
8021.0 RUNNING
8021.1 COMPLETED
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.