Slurm 集群中的 R 代码无法正确读取

Question

I'm running an R-code on a Slurm Cluster with the following ".sh" file:我正在使用以下“.sh”文件在 Slurm 集群上运行 R 代码：

#!/bin/bash
#SBATCH --partition=p_parallel
#SBATCH --nodes=1
#SBATCH --cpus-per-task=16
#SBATCH --workdir=/work/uder2/ODE/lancio/
module load statistics/r-3.6.1
srun Rscript   TEST.R

The R-code is quite simple. R 代码非常简单。 Sometimes like有时喜欢

DIRbase     = "/work/uder2/ODE/"
DIRdata     = paste(DIRbase,"data/",sep="")
list.files(DIRdata)
load(paste(DIRdata,"Data.Rdata",sep=""))


NAME = "PriorU" 
ialg = 3

nG  = 500  
LimEta = 40  

LimMu2  = 15 
LimMin = 500

LimMu = 0.1
LimSpike = 10
LimSigma2 = (8)^2/(-2*log(LimMu))*1.2


NAME = paste(NAME,"_ng",nG, sep="")

### ### ### ### ### ### ### ### 
### MODELS
### ### ### ### ### ### ### ### 

DATA = allGenesData
nrowData = nrow(DATA$premature)


sd1 = as.numeric(apply(DATA$premature,1,var))
sd2 = as.numeric(apply(DATA$mature,1,var))
sd3 = as.numeric(apply(DATA$nascent,1,var))

epsi = 0.000001
App = c(which(sd1<=epsi),which(sd2<=epsi),which(sd3<=epsi))
App2 = c(which(sd1>50),which(sd2>100000),which(sd3>1500))

minep = 0.1
xy1 = as.numeric(apply(DATA$premature,1,min))
xy2 = as.numeric(apply(DATA$mature,1,min))
xy3 = as.numeric(apply(DATA$nascent,1,min))
App3 = c(which(xy1<=minep),which(xy2<=minep),which(xy3<=minep))

In actuality, the code is much longer, but I don't think the content of the file is important.实际上，代码要长得多，但我认为文件的内容并不重要。

What is happening is that, sometimes, the code is not written properly.发生的情况是，有时代码编写不正确。 For example, instead of例如，代替

App3 = c(which(xy1<=minep),which(xy2<=minep),which(xy3<=minep))

is read已读

App3  which(xy1<=minep),which(xy2<=minep),which(xy3<=minep))

Then, without touching the code and launching again the ".sh" file, the code is read properly.然后，在不接触代码并再次启动“.sh”文件的情况下，正确读取代码。 This happens "randomly", and never in the same section of the code.这是“随机”发生的，并且永远不会发生在代码的同一部分。

It seems it is related to the code length.似乎与代码长度有关。

Any help?有什么帮助吗？

Thanks谢谢

EDIT 1:编辑 1：

As an example, the output of a slurm file is例如，slurm 文件的输出是

[1] "Data.Rdata"
Loading required package: MASS
##
## Markov Chain Monte Carlo Package (MCMCpack)
## Copyright (C) 2003-2020 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
##
## Support provided by the U.S. National Science Foundation
## (Grants SES-0350646 and SES-0350613)
##
Loading required package: stats4
null device 
          1 
Error: unexpected symbol in:
"      Beta0   = rep(-4,3),
      Betagonale Psi"
Execution halted
srun: error: node02: task 0: Exited with exit code 1

and the code is代码是

priors  = list(
     Beta0 = list(
         type        = "Normal",
         Par1        = rep(-4,3),
         Par2        = rep(10,3)
       ),
       Beta1 = list(
         type        = "Normal",
         Par1        = rep(1.8,3), 
         Par2        = rep(10,3)
       ),
      VarK   = list(
        type        = "TruncatedNormal",
        Par1        = rep(0,3),
        Par2        = rep(100,3),
        Par3        = rep(0.0000000,3),
        Par4        = rep(LimSigma2,3), 
        Par5        = rep(2,3)
        #Par5        = rep(2,3)
      ), 
      RegCoef = list(
          type        = "Normal",
          Par1        = c(0,0,0,0,0), ## (1 o stessa dimension)
          Par2        = rep(100,5)
      ),
      sigmaMat = list(
          type        = "InverseWishart",
          Par1        = rep(10,3), 
          Par2        = c(diag(1,5)) ## diagonale Psi
      ),

      DPpar = list(
          type        = "Gamma",
          Par1        = 1, 
          Par2        = 1 ## diagonale Psi
      )
    )

Answer 1

The symptom described here, a file stored on an NFS server is corrupt when read, is most of the time associated with race conditions on the file.此处描述的症状（存储在 NFS 服务器上的文件在读取时已损坏）大部分时间与文件的竞争条件相关。 Typically the file is open for writing from one NFS client (the login node) and open for reading from another client (a compute node).通常，该文件打开以从一个 NFS 客户端（登录节点）写入，并打开以从另一个客户端（计算节点）读取。 As there is no global lock mechanism in NFS, the client that is reading the file does not know that the file is being written.由于 NFS 中没有全局锁定机制，读取文件的客户端不知道文件正在写入。 With advanced editors that support auto-save, the file can sometimes be written on disk in an inconsistant state, for instance in the middle of a copy/paste operation.使用支持自动保存的高级编辑器，文件有时会以不一致的状态写入磁盘，例如在复制/粘贴操作过程中。

One option in that scenario is to avoid modifying the file at all while jobs are submitted or at least to deactivate auto-save.在这种情况下，一种选择是在提交作业时完全避免修改文件，或者至少禁用自动保存。

Another option is to make a copy of the file before the job is submitted so that it is not updated afterwards.另一种选择是在提交作业之前制作文件的副本，以便之后不会更新。

Slurm 集群中的 R 代码无法正确读取

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-03-05 15:32:34

Slurm 集群中的 R 代码无法正确读取

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-03-05 15:32:34

解决方案1
2 已采纳 2020-03-05 15:32:34