简体   繁体   中英

How to show the progress of code in parallel computation in R?

I am now dealing with a large dataset and some functions may take hours to process. I wonder how I can show the progress of the code through a progress bar or number(1,2,3,...,100). And I want to store the result as a data frame with two columns. Here is an example. Thanks.

require(foreach)
require(doParallel)
require(Kendall)

cores=detectCores()
cl <- makeCluster(cores-1)
registerDoParallel(cl)

mydata=matrix(rnorm(8000*500),ncol = 500)
result=as.data.frame(matrix(nrow = 8000,ncol = 2))
pb <- txtProgressBar(min = 1, max = 8000, style = 3)

foreach(i=1:8000,.packages = "Kendall",.combine = rbind) %dopar%         
{
  abc=MannKendall(mydata[i,])
  result[i,1]=abc$tau
  result[i,2]=abc$sl
  setTxtProgressBar(pb, i)
}
close(pb)
stopCluster(cl)

However, when I run the code, I did not see any progress bar showing up and the result is not right. Is there any suggestion? Thanks.

The doSNOW package has support for progress bars, while doParallel does not. Here's a way to put a progress bar in your example:

require(doSNOW)
require(Kendall)
cores <- parallel::detectCores()
cl <- makeSOCKcluster(cores)
registerDoSNOW(cl)
mydata <- matrix(rnorm(8000*500), ncol=500)
pb <- txtProgressBar(min=1, max=8000, style=3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress=progress)
result <- 
  foreach(i=1:8000, .packages="Kendall", .options.snow=opts,
          .combine='rbind') %dopar% {
    abc <- MannKendall(mydata[i,])
    data.frame(tau=abc$tau, sl=abc$sl)
  }
close(pb)
stopCluster(cl)

I think the pbapply package also does the job.

require(parallel)
require(pbapply)

mydata=matrix(rnorm(8000*500),ncol = 500)

cores=detectCores()
cl <- makeCluster(cores-1)
parallel::clusterExport(cl= cl,varlist = c("mydata"))
parallel::clusterEvalQ(cl= cl,library(Kendall))

result = pblapply(cl = cl,
         X = 1:8000,
         FUN = function(i){
  abc=MannKendall(mydata[i,])
  result = as.data.frame(matrix(nrow = 1,ncol = 2))
  result[1,1]=abc$tau
  result[1,2]=abc$sl
  return(result)
})

result = dplyr::bind_rows(result)
stopCluster(cl)

From the documentation, if a socket cluster is provided via cl then it calls parLapply()

Parallel processing can be enabled through the cl argument. parLapply is called when cl is a 'cluster' object, mclapply is called when cl is an integer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM