Slurm 作业数组无法使用 shapefile 运行 Rscript

Question

我想通过 Slurm 在 HPC 集群上运行作业数组，将单个圆形 shapefile 与人口普查块的大型 shapefile 相交，然后保存生成的相交 shapefile。 然后，我将在我自己的机器上将这些单独的 shapefile 组合成一个大文件。 这是一种避免我在前面的问题中描述的并行化问题的方法：从 R 中的 sf（简单特征）对象的列表中的映射错误

但是，在运行作业数组时，我收到以下错误：

sbatch: error: Batch job submission failed: Invalid job array specification

这是我在 HPC 集群上使用的 R 脚本、.sh 文件和文件名 csv 的链接： https ://github.com/msghankinson/slurm_job_array。

R 代码依赖于 3 个文件：

“缓冲区” - 这些是圆形多边形。 我已将 3,086 个圆圈的大型 shapefile 拆分为 3,086 个单独的 shapefile，每个 shapefile 有 1 个圆圈（保存在“lihtc_bites”文件夹中的 /lustre/ 中）。 R 脚本的目标是在脚本的每次运行中将 1 个圆与人口普查块相交，然后将该相交保存为 shapefile。 然后，我将在我自己的笔记本电脑上将这 3,086 个相交 shapefile 合并到一个数据帧中。 对于代表，我只包括 3,086 个 shapefile 中的 2 个。
“lihtc” - 这是我在 R 函数中用作索引的 shapefile。 此 shapefile 有 3 个版本。 每个圆形 shapefile 匹配这些“lihtc”shapefile 之一。 对于代表，我只包括一个与我的 2 个圆形 shapefile 匹配的 shapefile。
“块” - 这些是 710,000 个人口普查块。 对于 R 脚本的每次运行，该文件都保持不变，无论在交叉点中使用哪个圆。 对于 reprex，我只包含旧金山县 7,386 个街区的 shapefile。

我已经在特定的单个缓冲区和 lihtc shapefile 上运行了 R 代码，并且该功能有效。 所以我的主要关注点是启动作业数组的 .sh 文件（“lihtc_array_example.sh”）。 在这里，我尝试使用任务 ID 和“master_example.csv”（也在 reprex 中）在每个“缓冲区”shapefile 上运行我的 R 脚本，以定义将哪些文件加载到 R 中。master_example.csv 的每一行都包含缓冲区文件名和我需要的 lihtc 文件名。 这些文件名需要传递给 R 脚本并用于为每个交叉点加载正确的文件。 例如，任务 1 加载 master_example.csv 的第 1 行中列出的文件。 我发现的代码试图通过以下方式将这些名称提取到 .sh 文件中：

shp_filename=$( echo "$line_N" | cut -d "," -f 2 )
lihtc_filename=$( echo "$line_N" | cut -d "," -f 3 )

虽然我知道运行 reprex 很困难，但我想知道 .sh 文件、名称的 csv 和 R 脚本之间的管道是否有任何明显的故障？ 我很乐意提供任何可能有用的附加信息。

完整的 .sh 文件，便于访问：


#SBATCH -t 2:00:00
#SBATCH -p defq
#SBATCH -N 1
#SBATCH -o jobArrayScript_%A_%a.out
#SBATCH -e jobArrayScript_%A_%a.err
#SBATCH -a 1-3086%1000

line_N=$( awk "NR==$SLURM_ARRAY_TASK_ID" master_example.csv )  # NR means row-# in Awk
shp_filename=$( echo "$line_N" | cut -d "," -f 2 )
lihtc_filename=$( echo "$line_N" | cut -d "," -f 3 )

module load R/4.1.1
module load libudunits2/2.2.28
module load gdal/3.5.0
module load proj/6.3.0
module load geos/3.10.3

Rscript slurm_job_array.R $shp_filename $lihtc_filename

以供参考：

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.9   ggmap_3.0.0   ggplot2_3.3.6 sf_1.0-7     

loaded via a namespace (and not attached):
 [1] xfun_0.28           tidyselect_1.1.2    purrr_0.3.4         lattice_0.20-45     colorspace_2.0-3    vctrs_0.4.1         generics_0.1.2     
 [8] htmltools_0.5.2     s2_1.0.7            utf8_1.2.2          rlang_1.0.2         e1071_1.7-9         pillar_1.7.0        glue_1.6.2         
[15] withr_2.5.0         DBI_1.1.1           sp_1.4-6            wk_0.5.0            jpeg_0.1-9          lifecycle_1.0.1     plyr_1.8.7         
[22] stringr_1.4.0       munsell_0.5.0       gtable_0.3.0        RgoogleMaps_1.4.5.3 evaluate_0.15       knitr_1.36          fastmap_1.1.0      
[29] curl_4.3.2          class_7.3-19        fansi_1.0.3         highr_0.9           Rcpp_1.0.8.3        KernSmooth_2.23-20  scales_1.2.0       
[36] classInt_0.4-3      farver_2.1.0        rjson_0.2.20        png_0.1-7           digest_0.6.29       stringi_1.7.6       grid_4.1.0         
[43] cli_3.3.0           tools_4.1.0         bitops_1.0-7        magrittr_2.0.3      proxy_0.4-26        tibble_3.1.7        crayon_1.5.1       
[50] tidyr_1.2.0         pkgconfig_2.0.3     ellipsis_0.3.2      assertthat_0.2.1    rmarkdown_2.11      httr_1.4.2          rstudioapi_0.13    
[57] R6_2.5.1            units_0.7-2         compiler_4.1.0

Answer 1

发现并解决了 3 个问题：

最大数组大小是指整个数组。 油门只是设置一次安排多少作业。 所以我需要将我的 3,086 个工作任务分成 4 个单独的批次。 这可以在 .sh 文件中按以下方式完成： #SBATCH -a 1-999 for job 1 #SBATCH -a 1000-1999 for job 2，依此类推。
R 脚本需要从命令行捕获参数。 脚本现在开始： args = commandArgs(trailingOnly=TRUE) shp_filename <- args[1] lihtc_filename <- args[2]
提交文件正在发送带引号的参数，这阻止paste0创建可用的文件名。 noquote()和print(x, quotes = F)都无法删除这些引号。 但是gsub('"', '', x)起作用了。

就我而言，这是一种不优雅/懒惰的并行化，但它确实有效。 结案。

Slurm 作业数组无法使用 shapefile 运行 Rscript

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-07-07 19:20:20

Slurm 作业数组无法使用 shapefile 运行 Rscript

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-07-07 19:20:20

解决方案1
1 已采纳 2022-07-07 19:20:20