简体   繁体   English

R加速矢量化为方阵

[英]R Speed up vectorize for square matrix

Anyone able to help me speed up some code: 有人能帮我加速一些代码:

n = seq_len(ncol(mat)) # seq 1 to ncol(mat)
sym.pr<-outer(n,n,Vectorize(function(a,b) {
    return(adf.test(LinReg(mat[,c(a,b)]),k=0,alternative="stationary")$p.value)
}))

Where mat is an NxM matrix of N observation and M objects, eg: matN观察的NxM矩阵和M对象,例如:

    Obj1 Obj2 Obj3
1      .    .    .
2      .    .    .    
3      .    .    .

LinReg is defined as: LinReg定义为:

# Performs linear regression via OLS
LinReg=function(vals) {  
  # regression analysis
  # force intercept c at y=0
  regline<-lm(vals[,1]~as.matrix(vals[,2:ncol(vals)])+0)

  # return spread (residuals)
  return(as.matrix(regline$residuals))
}

Basically I am performing a regression analysis (OLS) on every combination of Objects (ie Obj1, Obj2 and Obj2,Obj3 and Obj1, Obj3 ) in mat , then using the adf.test function from the tseries package and storing the p-value . 基本上我在mat每个对象组合(即Obj1, Obj2Obj2,Obj3Obj1, Obj3 )上执行回归分析(OLS),然后使用tseries包中的adf.test函数并存储p-value The end result sym.pr is a symmetric matrix of all p-values (but actually it's not 100% symmetric, see here for more info ), nevertheless it will suffice. 最终结果sym.pr是所有p-values的对称矩阵(但实际上它并非100%对称,请参阅此处以获取更多信息 ),但它就足够了。

With the above code, on a 600x300 matrix (600 observations and 300 objects), it takes about 15 minutes.. 使用上面的代码,在600x300矩阵(600个观测值和300个物体)上,大约需要15分钟。

I thought of maybe only calculating the upper triangle of the symmetric matrix, but not sure how to go about doing it. 我想过可能只计算对称矩阵的上三角形,但不知道如何去做。

Any ideas? 有任何想法吗?

Thanks. 谢谢。

Starting with some dummy data 从一些虚拟数据开始

mdf <- data.frame( x1 = rnorm(5), x2 = rnorm(5), x3 = rnorm(5) )

I would firstly determine the combinations of interest. 我首先要确定感兴趣的组合。 So if I understood you right the result of your calculation should be the same for mdf[c(i,j)] and mdf[c(j,i)] . 因此,如果我理解你正确,你的计算结果应该与mdf[c(i,j)]mdf[c(j,i)] in this case you could use the combn function, to determine the relevant pairs. 在这种情况下,您可以使用combn函数来确定相关对。

pairs <- as.data.frame( t( combn( colnames( mdf  ),2 ) ) )
pairs
  V1 V2
1 x1 x2
2 x1 x3
3 x2 x3

Now you can just apply your function row-wise over the pairs (using a t.test here for simplicity): 现在,您可以在对上逐行应用函数(为简单起见,使用t.test):

pairs[["p.value"]] <- apply( pairs, 1, function( i ){
  t.test( mdf[i] )[["p.value"]]
})
pairs
  V1 V2   p.value
1 x1 x2 0.5943814
2 x1 x3 0.7833293
3 x2 x3 0.6760846

If you still need your p.values back in (upper triangular) matrix form you can cast them: 如果你仍然需要你的p.values回到(上三角形)矩阵形式,你可以投射它们:

library(reshape2)
acast( pairs, V1 ~ V2 )
          x2        x3
x1 0.5943814 0.7833293
x2        NA 0.6760846

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM