简体   繁体   中英

Parallel Processing Example in R

Firstly, I would like to say that I am new to this topic.

Secondly, although I read a lot about Parallel processing in R, I'm still not confident about it.

I just invented simulation in R. So can someone help me with this invented code to understand Parallel processing ? (I can see how it works)

My code as follows (Large Random numbers)

SimulateFn<-function(B,n){ 
  M1=list()
  for (i in 1:B){
    M1[i]=(n^2)}
  return(M1)}

SimulateFn(100000000,300000)

Could you please help me?

First of all, parallelization is the procedure of dividing a task into sub tasks, which are simultaneously processed by multiple processors or cores and can be independent or share some dependency between them - the latter case needs more planning and attention.

This procedure has some overhead to shedule subtasks - like copying data to each processor. That said, parallelization is worthless for fast computations. In your example, the threee main procedures are indexing ( [ ), assignment ( <- ), and a (fast) math operation ( ^ ). The overhead for paralellization may be greater than the time to execute the subtask, so in that case parallelization can result in poorer performance!


Despite that, simple parallelization in R is fairly easy. An approach to parallelize your task is provided below, using the doParallel package. Other approachs include using packages as parallel .

library(doParallel)
## choose number of processors/cores
cl <- makeCluster(2)
registerDoParallel(cl)
## register elapsed time to evaluate code snippet
## %dopar% execute code in parallale
B <- 100000; n <- 300000
ptime <- system.time({ 
  M1=list()
  foreach(i=1:B) %dopar% {
      M1[i]=(n^2)
    }
  })
## %do% execute sequentially
stime <- system.time({ 
  M1=list()
  foreach(i=1:B) %do% {
    M1[i]=(n^2)
  }
})

The elapsed times on my computer (2 core) were 59.472 and 44.932, respectively. Clearly, there were no improvement by parallelization: indeed, performance was worse!


A better example is shown below, where the main task is much more expensive in terms of computation need:

x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
ptime <- system.time({
  r <- foreach(icount(trials), .combine=cbind) %dopar% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)
    }
  })
stime <- system.time({
  r <- foreach(icount(trials), .combine=cbind) %do% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)
  }
})

And elapsed times were 24.709 and 34.502: a gain of 28%.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM