简体   繁体   English

使用 SLURM 在集群上运行作业时如何保存 output

[英]How to save output when running job on cluster using SLURM

I want to run an R script using SLURM.我想使用 SLURM 运行 R 脚本。 I have created the R script, "test.R" as shown:我创建了 R 脚本“test.R”,如图所示:

print("Running the test script")
write.csv(head(mtcars), "mtcars_data_test.csv")

I created a bash script to run this R script "submit.sh"我创建了一个 bash 脚本来运行这个 R 脚本“submit.sh”

#!/bin/bash

#sbatch --job-name=test.job
#sbatch --output=.out/abc.out
Rscript  /home/abc/job_sub_test/test.R

And I submitted the job on the cluster我在集群上提交了作业

sbatch submit.sh

I am not sure where my output is saved.我不确定我的 output 保存在哪里。 I looked in the home directory but no output file.我查看了主目录,但没有 output 文件。

Edit编辑

I also set my working directory in test.R , but nothing different.我还在test.R中设置了我的工作目录,但没有什么不同。

setwd("/home/abc")
print("Running the test script")
write.csv(head(mtcars), "mtcars_data_test.csv")

When I run the script without SLURM Rscript test.R , it worked fine and saved the output according to the set path.当我在没有 SLURM Rscript test.R的情况下运行脚本时,它运行良好并根据设置的路径保存了 output。

Slurm will set the job working directory to the directory which was the working directory when the sbatch command was issued. Slurm 会将作业工作目录设置为发出sbatch命令时的工作目录。

Assuming the /home directory is mounted on all compute nodes, you can change explicitly the working directory with cd in the submission script, or setwd() in the R syntax.假设/home目录安装在所有计算节点上,您可以使用提交脚本中的cd或 R 语法中的setwd()显式更改工作目录。 But that should not be necessary.但这不应该是必要的。

Three possibilities:三种可能:

  • either the job did not start at all because of a configuration or hardware issue;由于配置或硬件问题,作业根本没有开始; that you can find out with the sacct command, looking at the state column.您可以使用sacct命令查看state列。
  • either the file was indeed created but on the compute node on a filesystem that is not shared;该文件确实已创建,但在未共享的文件系统上的计算节点上; in that case the best option is to SSH to the compute node (which you can find out with sacct ) and look for the file there;在这种情况下,最好的选择是 SSH 到计算节点(您可以使用sacct找到)并在那里查找文件; or或者
  • the script crashed and the file was not created at all, in that case you should look into the output file of the job ( .out/abc.out ).脚本崩溃并且根本没有创建文件,在这种情况下,您应该查看作业的 output 文件( .out/abc.out )。 Beware that the .out directory must be present before the job starts, and that, as it starts with a .请注意,在作业开始之前必须存在.out目录,并且它以 .out 开头. , it will be a hidden file, revealed in ls only with the -a argument. ,它将是一个隐藏文件,在ls中仅使用-a参数显示。

The --output argument to sbatch is relative to the folder you submitted the job from. sbatch 的 --output 参数与您提交作业的文件夹相关。 setwd inside the R script wouldn't affect it, because Slurm has already parsed that argument and started piping output to the file by the time the R script is running. R 脚本中的 setwd 不会影响它,因为 Slurm 已经解析了该参数并在 ZE1E1D3D40573127E9EE0480CAF128D 脚本运行时开始将 output 传送到文件。

First, if you want the output to go to /home/abc/.out/ make sure you're in your homedir when you submit the script, or specify the full path to the --output argument.首先,如果您希望 output 到 go 到 /home/abc/.out/ 确保您在提交脚本时位于 homedir 中,或者指定 --output 参数的完整路径。

Second, the.out folder has to exist;其次,.out 文件夹必须存在; I tested this and Slurm does not create it if it doesn't.我对此进行了测试,如果没有,Slurm 不会创建它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM