[英]How to know the status of each process of one job in the slurm cluster manager?
If the processes you mention are distincts steps, then sacct
can give you the information as explained by @Christopher Bottoms. 如果您提到的过程是不同的步骤,那么
sacct
可以为您提供信息,如@Christopher Bottoms所述。
But if the processes are different tasks in a single step, then you can use this script that uses parallel SSH to run 'ps' commands on the compute nodes and offer a summarised view, as @Tom de Geus suggests. 但是,如果这些过程在单个步骤中是不同的任务,则可以使用此脚本 , 该脚本使用并行SSH在计算节点上运行“ ps”命令并提供汇总视图,如@Tom de Geus所建议。
Just use the command sacct
that comes with Slurm. 只需使用
sacct
随附的命令sacct。
Given this code ( my.sh
): 鉴于此代码(
my.sh
):
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=2
srun -n1 sleep 10 &
srun -n1 sleep 3
wait
I run it: 我运行它:
sbatch my.sh
And then check on it with sacct
: 然后用
sacct
检查它:
sacct
Which gives me per-step info: 这给了我每步信息:
JobID JobName Partition Account AllocCPUS State ExitCode
---------- ---------- ---------- ---------- ---------- ---------- --------
8021 my.sbatch CLUSTER me 2 RUNNING 0:0
8021.0 sleep me 1 RUNNING 0:0
8021.1 sleep me 1 COMPLETED 0:0
sacct
has a lot of options to customize its output. sacct
有很多选项可以自定义其输出。 For example, 例如,
sacct --format='JobID%6,State'
Will just give you the IDs (up to 6 characters) and the current state of jobs: 只会提供您的ID(最多6个字符)和作业的当前状态:
JobID State
------ ----------
8021 RUNNING
8021.0 RUNNING
8021.1 COMPLETED
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.