简体   繁体   中英

R parallel package - performance very slow in my toy example

I am trying to sample for two vectors 1000 times with replacement and calculate the ratio of means. Repeat this process 10,000 times.

I wrote a sample parallel code but it's taking much longer that using simple for loops on a single machine.

ratio_sim_par <- function(x1, x2, nrep = 1000) {

# Initiate cluster
cl <- makeCluster(detectCores() - 1) #Leave one core for other operations 

clusterExport(cl, varlist=c("x1", "x2", "nrep"), envir=environment())

Tboot <- parLapply(cl, 1:nrep, function(x){

    n1 <- length(x1)
    n2 <- length(x2) 

    xx1 <- sample(x1, n1, replace = TRUE) # sample of size n1 with replacement from x1
    xx2 <- sample(x2, n2, replace = TRUE) # sample of size n2 with replacement from x2
    return(mean(xx1) / mean(xx2))  
})

stopCluster(cl)

return(unlist(Tboot))

}

ratio_sim_par(x1, x2, 10000)

System time is unbearable. Can anyone help me understand the mistake I'm making? Thanks

Distributing tasks to different nodes takes a lot of computational overhead and can cancel out any gains you make from parallelizing your script. In your case, you're calling parLapply 10,000 times and probably spending more resources forking each task than actually doing the resampling. Try something like this with a non-parallel version of ratio_sim_par :

mclapply(1:10000, ratio_sim_par, x1, x2, nrep = 1000, mc.cores = n_cores)

mclapply will split the job into as many cores as you have available and fork it once. I'm using mclapply instead of parLapply because I'm used to it and doesn't require as much setup.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM