简体   繁体   中英

Optimizing a loop

I am implementing an example given in the book The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Hashtle, Tibshirani, Friedman).

My aim is to generate 10+10 means from two bivariate normal distributions, then use the first ten means to generate points labelled "Green" and the other ten means to generate "Red" points. The mean value of the bivariate gaussian from which a point must be generated has to be picked randomly every time. I am not too familiar with R, so I used the for-loop, using which it takes an awful lot of time as n gets bigger. Here's my code:

Sigma = diag(2)
greenMeans= mvrnorm(n=10, c(1,0), Sigma)
redMeans= mvrnorm(n=10, c(0,1), Sigma)

n=1000000
green<- array(dim=c(n,2))
red<- array(dim=c(n,2))

for (i in 1:n)
        {
            newGreen<- mvrnorm(n=1,greenMeans[sample(c(1:10),1,replace=TRUE),], Sigma/5)
            newRed<- mvrnorm(n=1,redMeans[sample(c(1:10),1,replace=TRUE),], Sigma/5)
            green[i,1] <- newGreen[1]
            green[i,2] <- newGreen[2]
            red[i,1] <- newRed[1]
            red[i,2] <- newRed[2]
    }

You can skip the for loop entirely and use replicate , not sure how much faster it is though:

do_stuff = function() {
   newGreen<- mvrnorm(n=1,greenMeans[sample(c(1:10),1,replace=TRUE),], Sigma/5)
   newRed<- mvrnorm(n=1,redMeans[sample(c(1:10),1,replace=TRUE),], Sigma/5)         
   return(list(newGreen, newRed))
 }
replicate(10000, do_stuff)   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM