简体   繁体   English

利用QR分解进行R的多元回归分析

[英]Multiple regression analysis in R using QR decomposition

I am trying to write a function for solving multiple regression using QR decomposition. 我正在尝试编写一个使用QR分解来解决多元回归的函数。 Input: y vector and X matrix; 输入:y向量和X矩阵; output: b, e, R^2. 输出:b,e,R ^ 2。 So far I`ve got this and am terribly stuck; 到目前为止,我已经得到了这个并且非常困难; I think I have made everything way too complicated: 我想我已经让事情变得太复杂了:

QR.regression <- function(y, X) {
X <- as.matrix(X)
y <- as.vector(y)
p <- as.integer(ncol(X))
if (is.na(p)) stop("ncol(X) is invalid")
n <- as.integer(nrow(X))
if (is.na(n)) stop("nrow(X) is invalid")
nr <- length(y)
nc <- NCOL(X)

# Householder 
for (j in seq_len(nc)) {
id <- seq.int(j, nr)
sigma <- sum(X[id, j]^2)
s <- sqrt(sigma)
diag_ej <- X[j, j]
gamma <- 1.0 / (sigma + abs(s * diag_ej))
kappa <- if (diag_ej < 0) s else -s
X[j,j] <- X[j, j] - kappa
if (j < nc)
for (k in seq.int(j+1, nc)) {
yPrime <- sum(X[id,j] * X[id,k]) * gamma
X[id,k] <- X[id,k] - X[id,j] * yPrime
}
yPrime <- sum(X[id,j] * y[id]) * gamma
y[id] <- y[id] - X[id,j] * yPrime
X[j,j] <- kappa
} # end of Householder transformation

rss <- sum(y[seq.int(nc+1, nr)]^2)  # residuals sum of squares
e <- rss/nr
e <- mean(residuals(QR.regression)^2)
beta <- solve(t(X) %*% X, t(X) %*% y)
for (i in seq_len(ncol(X))) # set zeros in the lower triangular side of X
X[seq.int(i+1, nr),i] <- 0
Rsq <- (X[1:nc,1:nc])^2
return(list(Rsq=Rsq, y = y, beta = beta, e = e))
}


UPDATE:
my.QR <- function(y, X) {
X <- as.matrix(X)
y <- as.vector(y)
p <- as.integer(ncol(X))
if (is.na(p)) stop("ncol(X) is invalid")
n <- as.integer(nrow(X))
if (is.na(n)) stop("nrow(X) is invalid")
qr.X <- qr(X)
b <- solve(t(X) %*% X, t(X) %*% y)
e <- as.vector(y - X %*% beta) #e
R2 <- (X[1:p, 1:p])^2
return(list(b = b, e= e, R2 = R2 ))
}

X <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
y <- c(1,2,3,4)
my.QR(X, y)

It all depends on how much of R's built-in facility you are allowed to use to solve this problem. 这一切都取决于你可以用多少R的内置设施来解决这个问题。 I already know that lm is not allowed, so here is the rest of the story. 我已经知道lm是不被允许的,所以这是故事的其余部分。


If you are allowed to use any other routines than lm 如果允许您使用除lm任何其他例程

Then you can simply use lm.fit , .lm.fit or lsfit for QR-based ordinary least squares solving. 然后你可以简单地使用lm.fit.lm.fitlsfit进行基于QR的普通最小二乘求解。

lm.fit(X, y)
.lm.fit(X, y)
lsfit(X, y, intercept = FALSE)

Among those, .lm.fit is the most light-weighed, while lm.fit and lsfit are pretty similar. 其中, .lm.fit是最轻量级的,而lm.fitlsfit非常相似。 The following is what we can do via .lm.fit : 以下是我们可以通过.lm.fit做的.lm.fit

f1 <- function (X, y) {
  z <- .lm.fit(X, y)
  RSS <- crossprod(z$residuals)[1]
  TSS <- crossprod(y - mean(y))[1]
  R2 <- 1 - RSS / TSS
  list(coefficients = z$coefficients, residuals = z$residuals, R2 = R2)
  }

In the question by your fellow classmate: Toy R function for solving ordinary least squares by singular value decomposition , I have already used this to check the correctness of SVD approach. 在你的同学的问题中: 玩具R函数通过奇异值分解求解普通最小二乘法 ,我已经用它来检查SVD方法的正确性。


If you are not allowed to use R's built-in QR factorization routine qr.default 如果您不允许使用R的内置QR分解例程qr.default

If .lm.fit is not allowed, but qr.default is, then it is also not that complicated. 如果.lm.fit是不允许的,但qr.default是,那么它也没有那么复杂。

f2 <- function (X, y) {
  ## QR factorization `X = QR`
  QR <- qr.default(X)
  ## After rotation of `X` and `y`, solve upper triangular system `Rb = Q'y` 
  b <- backsolve(QR$qr, qr.qty(QR, y))
  ## residuals
  e <- as.numeric(y - X %*% b)
  ## R-squared
  RSS <- crossprod(e)[1]
  TSS <- crossprod(y - mean(y))[1]
  R2 <- 1 - RSS / TSS
  ## multiple return
  list(coefficients = b, residuals = e, R2 = R2)
  }

If you further want variance-covariance of estimated coefficients, follow How to calculate variance of least squares estimator using QR decomposition in R? 如果您还需要估计系数的方差 - 协方差,请按照如何使用R中的QR分解计算最小二乘估计的方差? .


If you are not even allowed to use qr.default 如果您甚至不允许使用qr.default

Then we have to write QR decomposition ourselves. 然后我们必须自己编写QR分解。 Writing a Householder QR factorization function in R code is giving this. 在R代码中编写Householder QR分解函数就是这样。

Using the function myqr there, we can write 在那里使用myqr函数,我们可以写

f3 <- function (X, y) {
  ## our own QR factorization
  ## complete Q factor is not required
  QR <- myqr(X, complete = FALSE)
  Q <- QR$Q
  R <- QR$R
  ## rotation of `y`
  Qty <- as.numeric(crossprod(Q, y))
  ## solving upper triangular system
  b <- backsolve(R, Qty)
  ## residuals
  e <- as.numeric(y - X %*% b)
  ## R-squared
  RSS <- crossprod(e)[1]
  TSS <- crossprod(y - mean(y))[1]
  R2 <- 1 - RSS / TSS
  ## multiple return
  list(coefficients = b, residuals = e, R2 = R2)
  }

f3 is not extremely efficient, as we have formed Q explicitly, even though it is the thin- Q factor. f3并不是非常有效,因为我们明确地形成了Q ,即使它是薄Q因子。 In principle, we should rotate y along with the QR factorization of X , thus Q needs not be formed. 原则上,我们应该将y随着X的QR因子分解而旋转,因此不需要形成Q


If you want to fix your existing code 如果要修复现有代码

This requires some debugging effort so would take some time. 这需要一些调试工作,因此需要一些时间。 I will make another answer regarding this later. 我稍后会就此作出另一个答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM