简体   繁体   中英

Condensing a for-loop in R

I have the following code in R

library(mvtnorm)

m = matrix(rnorm(2000000),nrow=200)
A = matrix(rnorm(40000),ncol=200)
A = A%*%t(A)
C = array(A,c(200,200,10000))

B = 10000
S = 100

postpred = array(NA,c(200,S,B))
for(i in 1:B){
    postpred[,,i] = t(rmvnorm(S,m[,i],C[,,i],method="svd"))
}

but this code is extremely slow because I have to loop 10,000 times while also simulating from the multivariate normal 100 times and m and C can be very large as well. So what I would like to do is be able to calculate postpred outside of a loop. I have tried using the apply function but to no avail. Any help or suggestions greatly appreciated.

Others have pointed out that apply (and similar functions) won't help you much in your case, and they are right.

For what it is worth, I checked whether your would have a gain of performance by compiling your code. Here is a little benchmark that I made with your problem (I reduced the size of the matrices, because otherwise I cannot run them):

library(mvtnorm)

func = function()
{ 
  m = matrix(rnorm(200000),nrow=100)
  A = matrix(rnorm(10000),ncol=100)
  A = A%*%t(A)
  C = array(A,c(100,100,1000))

  B = 1000
  S = 10

  postpred = array(NA,c(1000,S,B))
  for(i in 1:B){
    postpred[,,i] = t(rmvnorm(S,m[,i],C[,,i],method="svd"))
  }
}

require(compiler)
func_compiled <- cmpfun(func)

require(microbenchmark)

microbenchmark(func_compiled(), func(), times=10) # grab a coffee, this takes some time

The results show that compiling won't give you any advantage:

Unit: seconds
                 expr      min       lq   median       uq      max neval
 slow_func_compiled() 9.938632 10.12269 10.18237 10.48215 15.43299    10
          slow_func() 9.969320 10.07676 10.21916 15.44664 15.66109    10

(this could have been expected, as the library mvtnorm should be already compiled)

Overall, you have only two ways left to optimize your code in R:

  1. use smaller numbers (if acceptable)
  2. parallelize your code

As Josillber says, vectorisation ( apply family of functions ) ain't going to do much for you, it really is a bit of an R myth that it gives significant speed improvements.

Suggest you look at parallel options, there are parallel mcapply and snow packages. Read more here http://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM