I have use case where i am trying to run call my Rscript over a bunch of files. I have written down the snippet below -
for(i in 1:length(fileNames)){
generateTolerancesCommand = paste(c("Rscript ",modelScriptName,
" --inp=",paste(c("'",dimensionsFolder, "/", fileNames[i],"'"), collapse = ""),
" --sea=",seasonal,
" --freq=",freq,
" --outp=",paste(c("'",outputFolder,"/","'"), collapse=""),
" --tp=",tp,
" --sd=",sd,
" --end=",end,
" --op=",op,
" --tls=",tls,
" --pts=",pts,
" --userf=",paste(c("'",dimensionsFeedbackFolder, "/", fileNames[i],"'"), collapse = "")
),collapse="")
system(generateTolerancesCommand)
}
This works fine, but it usually takes around 3-4 minutes for 1 loop to finish eventually taking around 2 hours currently to finish the script. I think i can do it better by making it parallel. In each iteration, I am producing an independent snippet of R execution run which can work on an independent data-set. I tried reading on parallel libraries like parallel, doParallel in R, but i am not able to figure out the best way which can be applied for my usecase. Can someone experienced in this suggest me a way ?
If you don't want to create variables in the environment but just write some output files, you can just replace your loop with foreach
.
library(foreach)
cl <- parallel::makeCluster(parallel::detectCores() - 1)
doParallel::registerDoParallel(cl)
foreach(i = seq_along(fileNames), .combine = 'c') %dopar% {
## PUT YOUR CODE HERE
NULL
}
parallel::stopCluster(cl)
The NULL
here with the .combine = 'c'
is just so that foreach
returns nothing (just a NULL
), because foreach
works more than an lapply
than a for-loop. You can learn more with this tutorial .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.