使用 SLURM 在集群上运行作业时如何保存 output

Question

I want to run an R script using SLURM.我想使用 SLURM 运行 R 脚本。 I have created the R script, "test.R" as shown:我创建了 R 脚本“test.R”，如图所示：

print("Running the test script")
write.csv(head(mtcars), "mtcars_data_test.csv")

I created a bash script to run this R script "submit.sh"我创建了一个 bash 脚本来运行这个 R 脚本“submit.sh”

#!/bin/bash

#sbatch --job-name=test.job
#sbatch --output=.out/abc.out
Rscript  /home/abc/job_sub_test/test.R

And I submitted the job on the cluster我在集群上提交了作业

sbatch submit.sh

I am not sure where my output is saved.我不确定我的 output 保存在哪里。 I looked in the home directory but no output file.我查看了主目录，但没有 output 文件。

Edit编辑

I also set my working directory in test.R , but nothing different.我还在test.R中设置了我的工作目录，但没有什么不同。

setwd("/home/abc")
print("Running the test script")
write.csv(head(mtcars), "mtcars_data_test.csv")

When I run the script without SLURM Rscript test.R , it worked fine and saved the output according to the set path.当我在没有 SLURM Rscript test.R的情况下运行脚本时，它运行良好并根据设置的路径保存了 output。

Answer 1

Slurm will set the job working directory to the directory which was the working directory when the sbatch command was issued. Slurm 会将作业工作目录设置为发出sbatch命令时的工作目录。

Assuming the /home directory is mounted on all compute nodes, you can change explicitly the working directory with cd in the submission script, or setwd() in the R syntax.假设/home目录安装在所有计算节点上，您可以使用提交脚本中的cd或 R 语法中的setwd()显式更改工作目录。 But that should not be necessary.但这不应该是必要的。

Three possibilities:三种可能：

either the job did not start at all because of a configuration or hardware issue;由于配置或硬件问题，作业根本没有开始； that you can find out with the sacct command, looking at the state column.您可以使用sacct命令查看state列。
either the file was indeed created but on the compute node on a filesystem that is not shared;该文件确实已创建，但在未共享的文件系统上的计算节点上； in that case the best option is to SSH to the compute node (which you can find out with sacct ) and look for the file there;在这种情况下，最好的选择是 SSH 到计算节点（您可以使用sacct找到）并在那里查找文件； or或者
the script crashed and the file was not created at all, in that case you should look into the output file of the job ( .out/abc.out ).脚本崩溃并且根本没有创建文件，在这种情况下，您应该查看作业的 output 文件（ .out/abc.out ）。 Beware that the .out directory must be present before the job starts, and that, as it starts with a .请注意，在作业开始之前必须存在.out目录，并且它以 .out 开头. , it will be a hidden file, revealed in ls only with the -a argument. ，它将是一个隐藏文件，在ls中仅使用-a参数显示。

Answer 2

The --output argument to sbatch is relative to the folder you submitted the job from. sbatch 的 --output 参数与您提交作业的文件夹相关。 setwd inside the R script wouldn't affect it, because Slurm has already parsed that argument and started piping output to the file by the time the R script is running. R 脚本中的 setwd 不会影响它，因为 Slurm 已经解析了该参数并在 ZE1E1D3D40573127E9EE0480CAF128D 脚本运行时开始将 output 传送到文件。

First, if you want the output to go to /home/abc/.out/ make sure you're in your homedir when you submit the script, or specify the full path to the --output argument.首先，如果您希望 output 到 go 到 /home/abc/.out/ 确保您在提交脚本时位于 homedir 中，或者指定 --output 参数的完整路径。

Second, the.out folder has to exist;其次，.out 文件夹必须存在； I tested this and Slurm does not create it if it doesn't.我对此进行了测试，如果没有，Slurm 不会创建它。

使用 SLURM 在集群上运行作业时如何保存 output

问题描述

2 个解决方案

解决方案1
1 2019-11-07 09:05:26

解决方案2
0 2019-11-13 20:51:14

使用 SLURM 在集群上运行作业时如何保存 output

问题描述

2 个解决方案

解决方案1 1 2019-11-07 09:05:26

解决方案2 0 2019-11-13 20:51:14

解决方案1
1 2019-11-07 09:05:26

解决方案2
0 2019-11-13 20:51:14