简体   繁体   中英

R: foreach loop with nested for loops not looping through

I am trying to get some nested loops to run faster in R (in windows), the master loop running through a large dataset (ie 800000 x 3 matrix).

After trying to remove the temporary variables from the intermediate loops, I am now trying to get R to run the loop on the 4 cores of my machine instead of 1.

Thus I did the following:

install.packages('doSNOW')
library(doSNOW)
library(foreach)

c1<-makeCluster(4)  
registerDoSNOW(c1)

foreach(k=1:length(big_data[,1])) %dopar% {

x<-big_data[k,1]
y<-big_data[k,2]

  for (i in 1:length(data_2[,1] {
   if ( # condition on x and y) {
    new_data1<- …
  new_data2<- …
  new_data3<- …
    for (j in 1:length(new_data3)) {
# do something
}
}
}
rm(new_data1)
rm(new_data2)
rm(new_data3)
gc()
}
stopCluster(c1)

My issue is that R keeps running and after say 10min when I stop the script manually I still have k=1 (without getting any explicit errors from R). I can see while R runs that it is using the 4 cores fine.

In comparison, when I use a simple for loop instead of foreach, only 1 core is used but at least after 10min my indices k have increased, and results are being stored.

So it appears that either, foreach is much slower than for (which doesnt make sense), or foreach just doesnt get into the other loops for some reason?

Any ideas on how to overcome this problem would be appreciated.

When you stop execution, there is no single value of k to examine. A different k is passed to each of the nodes, so at the same moment in time, one node might be at k=3, and another might be at k=100. You don't have access to these different values of k . In fact, if you're using %dopar% , the k you get when you stop execution has nothing to do with the k in foreach: it's the same as the k you had before starting.

For example, try running this:

k <- 999
foreach(k=1:3) %dopar% { Sys.sleep(2) }
k

You'll get 999.

(On the other hand, if you were to try foreach(k=1:3) %do% { ... } , you'd get k=3, just as if you'd used k in a for loop.)

Your tasks are indeed running. You'll have to either wait it out or somehow speed up your (rather complex) loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM