简体   繁体   中英

Fast matrix computation in R

I need to compute:

在此处输入图片说明

I can further decompose this into:

在此处输入图片说明

In RI wrote this code

att_num <- dim(X)[2]
A <- matrix(0, att_num, att_num)
for(i in seq(att_num)) A[,i] <- colSums(dx * X * X[,i])

But it is TERRIBLE slow because of the loop. This line is taking most of the computing time in my script. Is there a way I can improve this computation?

  • dx is a vector of size [1 xm]
  • X is a matrix of size [nxm]

Example:

dx <- sample(1:100, 30, replace=T)
X <- data.frame(replicate(30,sample(0:1,100,rep=TRUE)))

att_num <- dim(X)[2]
A <- matrix(0, att_num, att_num)
for(i in seq(att_num)) A[,i] <- colSums(dx * X * X[,i])
set.seed(42)
dx <- sample(1:100, 30, replace=T)
X <- data.frame(replicate(10,sample(0:1,100,rep=TRUE)))

att_num <- dim(X)[2]
A <- matrix(0, att_num, att_num)
for(i in seq(att_num)) A[,i] <- colSums(dx * X * X[,i])

B <- crossprod(as.matrix(dx * X), as.matrix(X))

all.equal(A, unname(B))
#[1] TRUE

Assuming x_i are the columns of X, then you can do it in a vectorized fashion using the matrix multiplication operator %*% :

library(Matrix)
set.seed(1234)
nrows <- 100
ncols <- 30 # same as length(dx)
dx <- sample(1:100, ncols, replace=T)
X <- matrix(sample(0:1, nrows*ncols, replace = TRUE), nrow = nrows, ncol = ncols)
A <- X %*% Diagonal(length(dx), dx) %*% t(X)

If X has a ton of zeros, I would highly recommend that you put it in a sparse format (check out sparseMatrix from the Matrix package). Note that the diagonal matrix in the middle is actually sparse. This saves A LOT of memory and computation.

NOTE 1: In the comments below, Roland noted that dx is not as long as X has rows. I would suggest checking exactly what you want to do because usually that should be the case! Also, normally x_i are the columns of X. If you post more information (including for example the limits of the index in the sum), I can help you more.

NOTE 2: Also, try using matrices instead of data frames. Data frames are a lot slower because they have to manage the columns separately.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM