简体   繁体   English

加快R中的Apply功能中的glm

[英]Speed up glm in apply function in R

My question is based on the following situation: 我的问题基于以下情况:
I have a matrix with 20 rows and > 100,000 columns. 我有一个包含20行和> 100,000列的矩阵。 I would like to apply the glm function and extract the Likelihood ratio statistic for each of the columns. 我想应用glm函数并提取每个列的似然比统计信息。 So far, I have tried to implement in this manner. 到目前为止,我已经尝试以这种方式实施。 For example: 例如:

X <- gl(5, 4, length = 20); Y <- gl(4, 1, length = 20)  
X <- factor(X); Y <- factor(Y)  
matrix <- matrix(sample.int(15, size = 20*100000, replace = TRUE), nrow = 20, ncol = 100000)
apply(matrix, 2, function(x) glm(x ~ X+Y, poisson)$deviance)

Is there any way to speed up the computation time? 有什么办法可以加快计算时间吗? I figured that since each vector that is used in glm is not big at all (vector of length 20), speedglm is not helpful here. 我发现由于glm中使用的每个向量根本都不大(长度为20的向量),因此speedglm在这里没有帮助。

I would be glad if anyone could give me advice on this. 如果有人可以给我建议,我将很高兴。 Thank you very much in advance! 提前非常感谢您!

I ran a test of 1000 columns. 我进行了1000列的测试。 It only took 2.4 seconds. 只花了2.4秒。

system.time(apply(matrix[,1:1000], 2, function(x) glm(x ~ X+Y, poisson)$deviance))

   user  system elapsed 
   2.40    0.00    2.46

I also tried 50,000 and it seemed to scale very linearly. 我还尝试了50,000,它似乎呈线性增长。

Therefore you only need to wait for 4 minutes to compute 100,000 cols. 因此,您只需要等待4分钟即可计算100,000个cols。 So I don't see the problem. 所以我看不到问题。 However, the bottle neck is the overhead of calling the gbm() function 100,000 times. 但是,瓶颈是调用gbm()函数100,000次的开销。 Try to avoid running a high level function that many times. 尽量避免多次运行高级功能。

To run faster, listed ascending in terms of effort: 为了更快地运行,在工作量方面列为升序:

  • wrap it in parallel loop (2x-4x times speed-up) 将其包装在并行循环中(加速的2到4倍)
  • figure out to perform the calculation as matrix multiplications in R (~50x) 找出以R(〜50x)为矩阵乘法执行计算
  • implement with Rcpp (~100x) 用Rcpp实现(〜100x)

None of the solutions will take you less than 4 minutes to achieve 所有解决方案都不会花费您不到4分钟的时间

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM