简体   繁体   中英

Speed up glm in apply function in R

My question is based on the following situation:
I have a matrix with 20 rows and > 100,000 columns. I would like to apply the glm function and extract the Likelihood ratio statistic for each of the columns. So far, I have tried to implement in this manner. For example:

X <- gl(5, 4, length = 20); Y <- gl(4, 1, length = 20)  
X <- factor(X); Y <- factor(Y)  
matrix <- matrix(sample.int(15, size = 20*100000, replace = TRUE), nrow = 20, ncol = 100000)
apply(matrix, 2, function(x) glm(x ~ X+Y, poisson)$deviance)

Is there any way to speed up the computation time? I figured that since each vector that is used in glm is not big at all (vector of length 20), speedglm is not helpful here.

I would be glad if anyone could give me advice on this. Thank you very much in advance!

I ran a test of 1000 columns. It only took 2.4 seconds.

system.time(apply(matrix[,1:1000], 2, function(x) glm(x ~ X+Y, poisson)$deviance))

   user  system elapsed 
   2.40    0.00    2.46

I also tried 50,000 and it seemed to scale very linearly.

Therefore you only need to wait for 4 minutes to compute 100,000 cols. So I don't see the problem. However, the bottle neck is the overhead of calling the gbm() function 100,000 times. Try to avoid running a high level function that many times.

To run faster, listed ascending in terms of effort:

  • wrap it in parallel loop (2x-4x times speed-up)
  • figure out to perform the calculation as matrix multiplications in R (~50x)
  • implement with Rcpp (~100x)

None of the solutions will take you less than 4 minutes to achieve

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM