简体   繁体   中英

Get residuals from least squares fit using less memory

I'm fitting least squares models with the same predictor and a large number of responses, and all I need are the residuals. The qr.resid function is the simplest I've found to do this, but it takes more memory than necessary because the Fortran code inside of it returns some unnecessary things which are discarded before qr.resid returns the result.

This means that when the number of responses is big enough, my system has to start swapping memory and it takes a very long time to get to the answer. Doing it one by one (with a loop in R) is faster because it doesn't swap, but is (presumably) slower than it would be to do the whole thing in a vectorized way.

That is, this version is faster when y is small enough not to cause swapping:

reslsfit1 <- function (x, y) {
  qr.resid(qr(x),y)
}

But this version is faster for large y because it doesn't swap:

reslsfit2 <- function (x, y) {
  x <- unname(x)
  y <- unname(y)
  out <- matrix(NA, ncol=ncol(y), nrow=nrow(y))
  qrx <- qr(x)
  for(i in 1:ncol(y)) {out[,i] <- qr.resid(qrx, y[,i])}
  out
}

Any suggestions? I'd prefer to use an existing function (or at least, an existing algorithm) rather than rolling my own because of the potential for numerical issues.

Here's code that recreates the issue; you'll have to make N big enough for your system.

set.seed(5)
n <- 1000
N <- 10000 # make this big enough for your system to swap
y <- matrix(rnorm(n*N), ncol=N)
x <- rnorm(n)
r1 <- reslsfit1(x,y)
r2 <- reslsfit2(x,y)

Interestingly, qr.fitted and qr.coef don't seem to take as much memory, so either of these work for me without running out memory. The qr.coef version seems slightly faster.

reslsfit3 <- function(x,y) { y - qr.fitted(qr(x), y) }
reslsfit4 <- function(x,y) { y - A %*% qr.coef(qr(x), y) }

Why does your y-variable have many columns and your x-variable only one? Did you get them mixed up?

Does lsfit do any better, say via lsfit()$residuals? It still uses qr decomposition, so it may not help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM