简体   繁体   English

R foreach%dopar%结果

[英]R foreach %dopar% Results

I am trying to run a function using foreach and %dopar% that will pass its results back into itself for each iteration. 我正在尝试使用foreach%dopar%运行一个函数,该函数会将每次迭代的结果传递回自身。 Small example below: 下面的小例子:

require(doParallel)

test_function <- function(data)
{
  result <- rbind(data, data)
  return(result)
}

test_data <- mtcars

cl                          <-          makeCluster(4)
registerDoParallel(cl)
results                     <-          foreach(i = 1:10) %dopar%
{
  aa <- test_function(test_data)
  aa$iteration <- i
  test_data <- aa
  return(aa)
}
stopCluster(cl)

What I am hoping to see in results is a list of ten data frames, each one sequentially doubling in number of rows. 我希望在results看到的是一个十个数据帧的列表,每个数据帧的行数依次加倍。

It appears that re-defining test_data within the foreach function does not do this, as it would if I just ran these commands within a standard for loop - like so: 似乎在foreach函数中重新定义test_data并不能做到这一点,就像我只是在标准for循环中运行这些命令一样-像这样:

results <- list()
for(i in 1:10)
{
  aa <- test_function(test_data)
  aa$iteration <- i
  test_data <- aa
  results[[i]] <- aa
}

Would appreciate any insight into what I'm overlooking here. 感谢您对我在这里忽略的任何见解。

If I understand your question correctly, your issues are caused because you are unable to update the global variable test_data from within the parallelised for-loop. 如果我正确理解了您的问题,则可能是由于无法从并行化的for循环中更新全局变量test_data而引起的。

To understand why you are being prevented from doing so, consider what is actually happening within the parallelised for-loop: multiple workers running on different threads are performing operations in parallel, each with their own separate, locally-scoped variables. 要了解为什么会阻止您这样做,请考虑并行化的for循环中实际发生的情况:在不同线程上运行的多个工作程序正在并行执行操作,每个操作都有各自单独的局部作用域变量。 If they had access to any global variable (or shared memory) without any kind of protection that controls access to it, then it would be possible to corrupt whatever is stored in the variable - and there are several different ways this corruption might happen. 如果他们可以访问任何全局变量(或共享内存)而没有任何控制访问的全局保护,则有可能破坏存储在变量中的任何内容-这种破坏可能有几种不同的发生方式。

Preventing this is the raison d'être of concurrency control structures like semaphores . 防止这种情况的发生是诸如信号量之类的并发控制结构的根本原因。 These allow users to do what you are trying to, but require some care to use correctly. 这些允许用户执行您要尝试的操作,但是需要谨慎使用才能正确使用。

However, they are not a available in R. Hence, it makes sense that R should protect that global variable test_data from being modified in a non- thread safe manner. 但是,它们在R中不可用。因此,有意义的是R应该保护该全局变量test_data不会以非线程安全的方式被修改。 It's actually trying to protect your data. 它实际上是在试图保护您的数据。

The solution is to rewrite your code to remove any attempt to update global variables (if you still want to do any kind of parallel processing) or switch to using a traditional, sequential for loop (as some commenters have already suggested). 解决方案是重写代码,以消除尝试更新全局变量的任何尝试(如果您仍然想要进行任何种类的并行处理),或者切换为使用传统的顺序for循环(如一些评论者已经建议的那样)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM