简体   繁体   中英

R foreach %dopar% Results

I am trying to run a function using foreach and %dopar% that will pass its results back into itself for each iteration. Small example below:

require(doParallel)

test_function <- function(data)
{
  result <- rbind(data, data)
  return(result)
}

test_data <- mtcars

cl                          <-          makeCluster(4)
registerDoParallel(cl)
results                     <-          foreach(i = 1:10) %dopar%
{
  aa <- test_function(test_data)
  aa$iteration <- i
  test_data <- aa
  return(aa)
}
stopCluster(cl)

What I am hoping to see in results is a list of ten data frames, each one sequentially doubling in number of rows.

It appears that re-defining test_data within the foreach function does not do this, as it would if I just ran these commands within a standard for loop - like so:

results <- list()
for(i in 1:10)
{
  aa <- test_function(test_data)
  aa$iteration <- i
  test_data <- aa
  results[[i]] <- aa
}

Would appreciate any insight into what I'm overlooking here.

If I understand your question correctly, your issues are caused because you are unable to update the global variable test_data from within the parallelised for-loop.

To understand why you are being prevented from doing so, consider what is actually happening within the parallelised for-loop: multiple workers running on different threads are performing operations in parallel, each with their own separate, locally-scoped variables. If they had access to any global variable (or shared memory) without any kind of protection that controls access to it, then it would be possible to corrupt whatever is stored in the variable - and there are several different ways this corruption might happen.

Preventing this is the raison d'être of concurrency control structures like semaphores . These allow users to do what you are trying to, but require some care to use correctly.

However, they are not a available in R. Hence, it makes sense that R should protect that global variable test_data from being modified in a non- thread safe manner. It's actually trying to protect your data.

The solution is to rewrite your code to remove any attempt to update global variables (if you still want to do any kind of parallel processing) or switch to using a traditional, sequential for loop (as some commenters have already suggested).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM