R parallel package - performance very slow in my toy example

Question

I am trying to sample for two vectors 1000 times with replacement and calculate the ratio of means. Repeat this process 10,000 times.

I wrote a sample parallel code but it's taking much longer that using simple for loops on a single machine.

ratio_sim_par <- function(x1, x2, nrep = 1000) {

# Initiate cluster
cl <- makeCluster(detectCores() - 1) #Leave one core for other operations 

clusterExport(cl, varlist=c("x1", "x2", "nrep"), envir=environment())

Tboot <- parLapply(cl, 1:nrep, function(x){

    n1 <- length(x1)
    n2 <- length(x2) 

    xx1 <- sample(x1, n1, replace = TRUE) # sample of size n1 with replacement from x1
    xx2 <- sample(x2, n2, replace = TRUE) # sample of size n2 with replacement from x2
    return(mean(xx1) / mean(xx2))  
})

stopCluster(cl)

return(unlist(Tboot))

}

ratio_sim_par(x1, x2, 10000)

System time is unbearable. Can anyone help me understand the mistake I'm making? Thanks

Answer 1

Distributing tasks to different nodes takes a lot of computational overhead and can cancel out any gains you make from parallelizing your script. In your case, you're calling parLapply 10,000 times and probably spending more resources forking each task than actually doing the resampling. Try something like this with a non-parallel version of ratio_sim_par :

mclapply(1:10000, ratio_sim_par, x1, x2, nrep = 1000, mc.cores = n_cores)

mclapply will split the job into as many cores as you have available and fork it once. I'm using mclapply instead of parLapply because I'm used to it and doesn't require as much setup.

R parallel package - performance very slow in my toy example

Question

1 answers

solution1
2 ACCPTED 2020-04-13 03:09:02

R parallel package - performance very slow in my toy example

Question

1 answers

solution1 2 ACCPTED 2020-04-13 03:09:02

solution1
2 ACCPTED 2020-04-13 03:09:02