简体   繁体   English

加快轻按R代码

[英]Speed up tapply R code

I have 100 matrices which each have 604800 columns, and 101 rows. 我有100个矩阵,每个矩阵有604800列和101行。 For each matrix, I need to reduce the number of columns to 60480 by computing the 10 column averages. 对于每个矩阵,我需要通过计算10个列的平均值将列数减少到60480。

For example, for a vector 例如,对于矢量

c(1,2,3,4,5,6,7,8,9,10,...)

The 5 column average would be: 5列平均值为:

c(3,8,13,18,...)

The code I am using to do this is: 我用于执行此操作的代码是:

col.av = tapply(col, rep(1:(length(col)/10), each = 10), mean)

Where col is one of my 101 x 604800 matrices. 其中col是我的101 x 604800矩阵之一。 I have a for loop which iterates over the 100 matrices, however my problem is in the length of time needed to compute one run. 我有一个for循环,可以循环访问100个矩阵,但是我的问题是计算一次运行所需的时间长短。

If I am just using one matrix, it takes 20 minutes+ to execute which is not feasible. 如果我仅使用一个矩阵,则需要20分钟以上的时间才能执行。 Are there any suggestions on how I can improve the speed of computation? 关于如何提高计算速度有什么建议吗?

Thanks 谢谢

If you are fine with for loop, this one works for your case: 如果您for使用for循环,则此方法适用于您的情况:

col.av <- matrix(0, nrow(col), ncol(col)/10)
for (i in 1:ncol(col.av)) {
  col.av[,i] <- rowMeans(col[,(10*(i-1)+1):(10*i)])
}

Or without a for-loop and a custom function for readability. 或者没有for循环和自定义功能以提高可读性。 You can always wrap this in your for-loop or a call to apply. 您始终可以将其包装在for循环或调用中进行应用。

#generate data
nc=604800 
nr=101
test_m <- matrix(rnorm(nc*nr),ncol=nc)

#function to get rowmeans by 'window'-columns
get_rowmeans <- function(mm, window=10){
  indices <- seq(1,ncol(mm),by=window)
  res <- sapply(indices, function(i){
    return(rowMeans(mm[,i:(i+(window-1))]))
  })
  res
}

tt <- get_rowmeans(test_m)
#check one
> all(tt[,1]==rowMeans(test_m[,1:10]))
[1] TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM