简体   繁体   中英

debugging R code when using slurm

I am running simulations in R on a cluster. Each R file contains 100 models. Each model analyses a different data set. Cluster commands are included in a slurm file, shown below.

A small percentage of models apparently do not converge well enough to estimate the Hessian and an error is generated for these models. The errors are placed in an error log file. However, I cannot determine from looking at the parameter estimates, the error log file and the output log file which of the 100 models are generating the errors.

Here is an example of an error message

Error in chol.default(fit$hessian) : 
  the leading minor of order 3 is not positive definite
Calls: chol2inv -> chol -> chol.default

Parameter estimates are returned despite these errors. Some SE's are huge, but I think the SE's can be large sometimes even when an error message is not returned.

Is it possible to include an additional line in my slurm file below that will generate a log file containing both the errors and the rest of the output with the errors shown in their original location (for example, the location in which they are shown on my Windows laptop). That way I would be able to determine quickly which models were generating the errors by looking at the log file. I have been trying to think of a work-around, but have not been able to come up with anything so far.

Here is a slurm file:

#!/bin/bash
#SBATCH -J JS_N200_301_400_Oct31_17c.R
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -N 1
#SBATCH -t 2000
#SBATCH -p community.q
#SBATCH -o JS_N200_301_400_Oct31_17c.out
#SBATCH -e JS_N200_301_400_Oct31_17c.err
#SBATCH --mail-user markwm@myuniversity.edu
#SBATCH --mail-type ALL
Rscript JS_N200_301_400_Oct31_17c.R

Not sure if this is what you want, but R option error allows to control what should happen with errors (that you don't catch otherwise). For instance, setting

options(error = function() {
  traceback(2L)
  dump.frames(dumpto = "last.dump", to.file = TRUE)
})

at the beginning of your *.R script, or in a .Rprofile startup script, will (a) output the traceback if there's an error, but more importantly, it'll also (b) dump the call stack to file last.dump.rda , which you can load in a fresh R session as:

dump <- get(load("last.dump.rda"))

Note, that get(load( is not a mistake. Here dump is an object of class dump.frames which allows you to inspect the call stack and its content.

You can of course customize error to do other things.

I learned from an IT person in charge of the cluster that I can have the error messages added to the output log by simply removing the reference to the error log in the slurm file. See below. It seems to be good enough.

I plan to also output the model number into the log at the beginning and the end of each model's output for added clarity (which I should have been doing from the start).

#!/bin/bash
#SBATCH -J JS_N200_301_400_Oct31_17c.R
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -N 1
#SBATCH -t 2000
#SBATCH -p community.q
#SBATCH -o JS_N200_301_400_Oct31_17c.out
#SBATCH --mail-user markwm@myuniversity.edu
#SBATCH --mail-type ALL
Rscript JS_N200_301_400_Oct31_17c.R

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM