简体   繁体   English

使用更少的内存从最小二乘拟合中获取残差

[英]Get residuals from least squares fit using less memory

I'm fitting least squares models with the same predictor and a large number of responses, and all I need are the residuals. 我正在使用相同的预测器和大量响应来拟合最小二乘模型,而我所需要的只是残差。 The qr.resid function is the simplest I've found to do this, but it takes more memory than necessary because the Fortran code inside of it returns some unnecessary things which are discarded before qr.resid returns the result. qr.resid函数是我发现的最简单的函数,但它需要的内存比必要的多,因为它内部的Fortran代码返回一些不必要的东西,这些东西在qr.resid返回结果之前被丢弃。

This means that when the number of responses is big enough, my system has to start swapping memory and it takes a very long time to get to the answer. 这意味着当响应数量足够大时,我的系统必须开始交换内存,并且需要很长时间才能得到答案。 Doing it one by one (with a loop in R) is faster because it doesn't swap, but is (presumably) slower than it would be to do the whole thing in a vectorized way. 一个接一个地做(在R中有一个循环)比较快,因为它不会交换,但是(可能)比以矢量化方式完成整个事情要慢。

That is, this version is faster when y is small enough not to cause swapping: 也就是说,当y足够小而不会导致交换时,此版本会更快:

reslsfit1 <- function (x, y) {
  qr.resid(qr(x),y)
}

But this version is faster for large y because it doesn't swap: 但是对于大y来说这个版本更快,因为它不会交换:

reslsfit2 <- function (x, y) {
  x <- unname(x)
  y <- unname(y)
  out <- matrix(NA, ncol=ncol(y), nrow=nrow(y))
  qrx <- qr(x)
  for(i in 1:ncol(y)) {out[,i] <- qr.resid(qrx, y[,i])}
  out
}

Any suggestions? 有什么建议么? I'd prefer to use an existing function (or at least, an existing algorithm) rather than rolling my own because of the potential for numerical issues. 我更喜欢使用现有的函数(或者至少是现有的算法)而不是自己编写,因为可能存在数值问题。

Here's code that recreates the issue; 这是重新创建问题的代码; you'll have to make N big enough for your system. 你必须为你的系统做足够大的N

set.seed(5)
n <- 1000
N <- 10000 # make this big enough for your system to swap
y <- matrix(rnorm(n*N), ncol=N)
x <- rnorm(n)
r1 <- reslsfit1(x,y)
r2 <- reslsfit2(x,y)

Interestingly, qr.fitted and qr.coef don't seem to take as much memory, so either of these work for me without running out memory. 有趣的是, qr.fittedqr.coef似乎没有占用太多内存,所以这些qr.coef以为我工作而不会耗尽内存。 The qr.coef version seems slightly faster. qr.coef版本似乎稍微快一些。

reslsfit3 <- function(x,y) { y - qr.fitted(qr(x), y) }
reslsfit4 <- function(x,y) { y - A %*% qr.coef(qr(x), y) }

Why does your y-variable have many columns and your x-variable only one? 为什么你的y变量有很多列而你的x变量只有一个? Did you get them mixed up? 你搞砸了吗?

Does lsfit do any better, say via lsfit()$residuals? lsfit会做得更好吗,比如通过lsfit()$ residuals? It still uses qr decomposition, so it may not help. 它仍然使用qr分解,所以它可能没有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM