I am new to R and stats.In the domain I am currently working in, I am required to compute the cumulative column sums in a unique manner.
Initially a square band matrix of width b and number of rows n is provided.For example for n = 8 and b = 3
0 1 2 7 0 0 0 0
0 0 3 6 7 0 0 0
0 0 0 3 1 7 0 0
0 0 0 0 4 4 7 0
0 0 0 0 0 5 8 7
0 0 0 0 0 0 1 8
0 0 0 0 0 0 0 4
0 0 0 0 0 0 0 0
Then the matrix is to be transformed in such a way that anxb matrix with diagonals as columns are obtained.Like for the given example,
1 2 7
3 6 7
3 1 7
4 4 7
5 8 7
1 8 0
4 0 0
0 0 0
I am currently using the following function to perform this operation.
packedband <- function(x, n, b) {
mat <- sapply(0:(b-1), function(i)
diag(x[-(n:(n-i)), -(1:(1+i))])[1:n] )
mat[is.na(mat)] <- 0
return(mat)
}
And then apply the colCumsums function from matrixStats packageto obtain the desired output matrix.For the given example,
1 2 7
4 8 14
7 9 21
11 13 28
16 21 35
17 29 35
21 29 35
21 29 35
What I am looking for is a faster computation of these operations since in the given domain,the number of columns(or rows) can be > 10^5.Probably the step of calculating packedband function can be removed since the end goal is to obtain cumulative column sum. Thanks in advance.
After messing about with sparse matrices, I think a for
loop may work well here.
Try on original data
d = as.matrix(read.table(text="0 1 2 7 0 0 0 0
0 0 3 6 7 0 0 0
0 0 0 3 1 7 0 0
0 0 0 0 4 4 7 0
0 0 0 0 0 5 8 7
0 0 0 0 0 0 1 8
0 0 0 0 0 0 0 4
0 0 0 0 0 0 0 0 "))
colnames(d) <- NULL
Functions
packedband <- function(x, b=3) {
n = nrow(d)
mat <- sapply(0:(b-1), function(i)
diag(x[-(n:(n-i)), -(1:(1+i))])[1:n] )
mat[is.na(mat)] <- 0
matrixStats::colCumsums(mat)
}
forloop <- function(d, b=3){
n = nrow(d)
m = matrix(0, n, b)
for(i in 1:b) {
ro = 1:(n-i)
co = (1+i):n
vec = `length<-`(d[cbind(ro, co)], n)
vec[is.na(vec)] <- 0
m[ , i] = cumsum(vec)
}
m
}
# create initial sparse matrix just to omit time to convert
# as if its faster it may be worth storing your band matrices in sparse format
library(Matrix)
m <- as(d, "TsparseMatrix")
spm <- function(m, b=3){
x = sparseMatrix(i = m@i+1,
j = m@j - m@i,
x = m@x,
dims = c(nrow(m),b))
matrixStats::colCumsums(as.matrix(x))
}
all.equal(forloop(d), packedband(d))
all.equal(spm(m), packedband(d))
Try with bigger data
d = matrix(0, 5e3, 5e3)
d[(col(d) - row(d)) == 1] <- 1
d[(col(d) - row(d)) == 2] <- 1
d[ (col(d) - row(d)) == 3] <- 1
m <- as(d, "TsparseMatrix")
all.equal(forloop(d), packedband(d))
all.equal(spm(m), packedband(d))
microbenchmark::microbenchmark(packedband(d), forloop(d), spm(m), times=50)
# Unit: microseconds
# expr min lq mean median uq max neval cld
# packedband(d) 1348240.520 1724714.293 1740634.707 1733305.192 1763377.869 1960353.263 50 b
# forloop(d) 720.344 973.658 1054.461 1026.807 1174.731 1565.912 50 a
# spm(m) 2145.875 2437.321 2586.503 2480.133 2749.019 3766.051 50 a
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.