简体   繁体   中英

faster alternative to compute colCumsums of a band matrix

I am new to R and stats.In the domain I am currently working in, I am required to compute the cumulative column sums in a unique manner.

Initially a square band matrix of width b and number of rows n is provided.For example for n = 8 and b = 3

0 1 2 7 0 0 0 0
0 0 3 6 7 0 0 0
0 0 0 3 1 7 0 0
0 0 0 0 4 4 7 0
0 0 0 0 0 5 8 7
0 0 0 0 0 0 1 8
0 0 0 0 0 0 0 4
0 0 0 0 0 0 0 0   

Then the matrix is to be transformed in such a way that anxb matrix with diagonals as columns are obtained.Like for the given example,

1 2 7  
3 6 7 
3 1 7 
4 4 7 
5 8 7 
1 8 0
4 0 0
0 0 0

I am currently using the following function to perform this operation.

     packedband <- function(x, n, b) {
      mat <- sapply(0:(b-1), function(i)
         diag(x[-(n:(n-i)), -(1:(1+i))])[1:n] )
      mat[is.na(mat)] <- 0
      return(mat)
      }

And then apply the colCumsums function from matrixStats packageto obtain the desired output matrix.For the given example,

1    2     7
4    8    14
7    9    21
11   13   28
16   21   35
17   29   35
21   29   35
21   29   35

What I am looking for is a faster computation of these operations since in the given domain,the number of columns(or rows) can be > 10^5.Probably the step of calculating packedband function can be removed since the end goal is to obtain cumulative column sum. Thanks in advance.

After messing about with sparse matrices, I think a for loop may work well here.

Try on original data

d = as.matrix(read.table(text="0 1 2 7 0 0 0 0
0 0 3 6 7 0 0 0
0 0 0 3 1 7 0 0
0 0 0 0 4 4 7 0
0 0 0 0 0 5 8 7
0 0 0 0 0 0 1 8
0 0 0 0 0 0 0 4
0 0 0 0 0 0 0 0 "))

colnames(d) <- NULL

Functions

packedband <- function(x, b=3) {
      n = nrow(d)
      mat <- sapply(0:(b-1), function(i)
                  diag(x[-(n:(n-i)), -(1:(1+i))])[1:n] )
      mat[is.na(mat)] <- 0
      matrixStats::colCumsums(mat)
   }

forloop <- function(d, b=3){
     n = nrow(d)
     m = matrix(0, n, b)
      for(i in 1:b) {
        ro = 1:(n-i)
        co = (1+i):n
        vec = `length<-`(d[cbind(ro, co)], n)
        vec[is.na(vec)] <- 0
        m[ , i] = cumsum(vec)
      }
     m
   }

# create initial sparse matrix just to omit time to convert
# as if its faster it may be worth storing your band matrices in sparse format
library(Matrix)
m <- as(d, "TsparseMatrix") 

spm <- function(m, b=3){
x = sparseMatrix(i = m@i+1,
                 j = m@j - m@i,
                 x = m@x,
                 dims = c(nrow(m),b))
matrixStats::colCumsums(as.matrix(x))
}

all.equal(forloop(d), packedband(d))
all.equal(spm(m), packedband(d))

Try with bigger data

d = matrix(0, 5e3, 5e3)
d[(col(d) - row(d)) == 1] <- 1
d[(col(d) - row(d)) == 2] <- 1
d[ (col(d) - row(d)) == 3] <- 1

m <- as(d, "TsparseMatrix") 

all.equal(forloop(d), packedband(d))
all.equal(spm(m), packedband(d))

microbenchmark::microbenchmark(packedband(d), forloop(d), spm(m), times=50)
# Unit: microseconds
#           expr         min          lq        mean      median          uq         max neval cld
#  packedband(d) 1348240.520 1724714.293 1740634.707 1733305.192 1763377.869 1960353.263    50   b
#     forloop(d)     720.344     973.658    1054.461    1026.807    1174.731    1565.912    50  a 
#         spm(m)    2145.875    2437.321    2586.503    2480.133    2749.019    3766.051    50  a 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM