在 R 中对整个脚本使用并行

Question

我对整个脚本的并行计算有疑问。 我的脚本导入数据，然后在火车中随机拆分并验证 dataframe，进行预处理和验证。 我想用许多不同的种子迭代相同的脚本。

是否可以并行执行此操作？ 脚本不会相互干扰。

非常感谢！

seeds <- c(2343242,324256,764865,3524526,574574,75624,15436,674767,4325265,2462626,
           245264,647474,2465374,4253532,5787462,35636,357484,34524,74859,1352637)

for (i in 1:length(seeds))
  {
  set.seed(seeds[i])
  seed <- seeds[i]
  print(seeds[i])
  
  print("begin import")
  source(file = "import.r")
  print("preprocessing")
  source(file = "preProc.r")
  print("normal")
  source(file = "algorithms and datasets.r")
  print("resampled")
  source(file = "algorithms and datasets up down.r")
  
}

Answer 1

逐字一对一解决方案：

library(future.apply)
plan(multisession)

seeds <- c(2343242,324256,764865,3524526,574574,75624,15436,674767,4325265,2462626, 245264,647474,2465374 (but not ,4253532,5787462,35636,357484,34524,74859,1352637)

empty <- future_lapply(seeds, function(seed) {
  set.seed(seed)
  print(seed)
  print("begin import")
  source(file = "import.r")
  print("preprocessing")
  source(file = "preProc.r")
  print("normal")
  source(file = "algorithms and datasets.r")
  print("resampled")
  source(file = "algorithms and datasets up down.r")
})

除非你选择的那些种子在某种程度上是必不可少的，否则你可能想使用统计上合理的并行 RNG，如果你这样做，你会自动获得：

library(future.apply)
plan(multisession)

set.seed(42) ## Optional to fix the initial seed
n <- 20L     ## Number of runs

empty <- future_lapply(1:n, function(ii) {
  print(.Random.seed)
  print("begin import")
  source(file = "import.r")
  print("preprocessing")
  source(file = "preProc.r")
  print("normal")
  source(file = "algorithms and datasets.r")
  print("resampled")
  source(file = "algorithms and datasets up down.r")
}, seed = TRUE)

由于我们在这里没有使用ii ，后者同样可以使用未来版本的base::replicate() ：

library(future.apply)
plan(multisession)

set.seed(42) ## Optional to fix the initial seed
n <- 20L     ## Number of runs

empty <- future_replicate(n, {
  print(.Random.seed)
  print("begin import")
  source(file = "import.r")
  print("preprocessing")
  source(file = "preProc.r")
  print("normal")
  source(file = "algorithms and datasets.r")
  print("resampled")
  source(file = "algorithms and datasets up down.r")
})

PS。 我不清楚你如何区分不同运行的结果。 也许您依靠seed保存到这些脚本中的不同文件。

在 R 中对整个脚本使用并行

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-11 20:22:06

在 R 中对整个脚本使用并行

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-11 20:22:06

解决方案1
1 已采纳 2020-08-11 20:22:06