Parallel computation in R for shell commands

Question

I have use case where i am trying to run call my Rscript over a bunch of files. I have written down the snippet below -

for(i in 1:length(fileNames)){

  generateTolerancesCommand = paste(c("Rscript ",modelScriptName,
                                      " --inp=",paste(c("'",dimensionsFolder, "/", fileNames[i],"'"), collapse = ""),
                                      " --sea=",seasonal,
                                      " --freq=",freq,
                                      " --outp=",paste(c("'",outputFolder,"/","'"), collapse=""),
                                      " --tp=",tp,
                                      " --sd=",sd,
                                      " --end=",end,
                                      " --op=",op,
                                      " --tls=",tls,
                                      " --pts=",pts,
                                      " --userf=",paste(c("'",dimensionsFeedbackFolder, "/", fileNames[i],"'"), collapse = "")
                                     ),collapse="")
  system(generateTolerancesCommand)
}

This works fine, but it usually takes around 3-4 minutes for 1 loop to finish eventually taking around 2 hours currently to finish the script. I think i can do it better by making it parallel. In each iteration, I am producing an independent snippet of R execution run which can work on an independent data-set. I tried reading on parallel libraries like parallel, doParallel in R, but i am not able to figure out the best way which can be applied for my usecase. Can someone experienced in this suggest me a way ?

Answer 1

If you don't want to create variables in the environment but just write some output files, you can just replace your loop with foreach .

library(foreach)
cl <- parallel::makeCluster(parallel::detectCores() - 1) 
doParallel::registerDoParallel(cl)
foreach(i = seq_along(fileNames), .combine = 'c') %dopar% {

  ## PUT YOUR CODE HERE

  NULL
}
parallel::stopCluster(cl)

The NULL here with the .combine = 'c' is just so that foreach returns nothing (just a NULL ), because foreach works more than an lapply than a for-loop. You can learn more with this tutorial .

Parallel computation in R for shell commands

Question

1 answers

solution1
2 ACCPTED 2017-10-12 09:30:58

Parallel computation in R for shell commands

Question

1 answers

solution1 2 ACCPTED 2017-10-12 09:30:58

solution1
2 ACCPTED 2017-10-12 09:30:58