[英]Multiple regression analysis in R using QR decomposition
I am trying to write a function for solving multiple regression using QR decomposition. 我正在尝试编写一个使用QR分解来解决多元回归的函数。 Input: y vector and X matrix; 输入:y向量和X矩阵; output: b, e, R^2. 输出:b,e,R ^ 2。 So far I`ve got this and am terribly stuck; 到目前为止,我已经得到了这个并且非常困难; I think I have made everything way too complicated: 我想我已经让事情变得太复杂了:
QR.regression <- function(y, X) {
X <- as.matrix(X)
y <- as.vector(y)
p <- as.integer(ncol(X))
if (is.na(p)) stop("ncol(X) is invalid")
n <- as.integer(nrow(X))
if (is.na(n)) stop("nrow(X) is invalid")
nr <- length(y)
nc <- NCOL(X)
# Householder
for (j in seq_len(nc)) {
id <- seq.int(j, nr)
sigma <- sum(X[id, j]^2)
s <- sqrt(sigma)
diag_ej <- X[j, j]
gamma <- 1.0 / (sigma + abs(s * diag_ej))
kappa <- if (diag_ej < 0) s else -s
X[j,j] <- X[j, j] - kappa
if (j < nc)
for (k in seq.int(j+1, nc)) {
yPrime <- sum(X[id,j] * X[id,k]) * gamma
X[id,k] <- X[id,k] - X[id,j] * yPrime
}
yPrime <- sum(X[id,j] * y[id]) * gamma
y[id] <- y[id] - X[id,j] * yPrime
X[j,j] <- kappa
} # end of Householder transformation
rss <- sum(y[seq.int(nc+1, nr)]^2) # residuals sum of squares
e <- rss/nr
e <- mean(residuals(QR.regression)^2)
beta <- solve(t(X) %*% X, t(X) %*% y)
for (i in seq_len(ncol(X))) # set zeros in the lower triangular side of X
X[seq.int(i+1, nr),i] <- 0
Rsq <- (X[1:nc,1:nc])^2
return(list(Rsq=Rsq, y = y, beta = beta, e = e))
}
UPDATE:
my.QR <- function(y, X) {
X <- as.matrix(X)
y <- as.vector(y)
p <- as.integer(ncol(X))
if (is.na(p)) stop("ncol(X) is invalid")
n <- as.integer(nrow(X))
if (is.na(n)) stop("nrow(X) is invalid")
qr.X <- qr(X)
b <- solve(t(X) %*% X, t(X) %*% y)
e <- as.vector(y - X %*% beta) #e
R2 <- (X[1:p, 1:p])^2
return(list(b = b, e= e, R2 = R2 ))
}
X <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
y <- c(1,2,3,4)
my.QR(X, y)
It all depends on how much of R's built-in facility you are allowed to use to solve this problem. 这一切都取决于你可以用多少R的内置设施来解决这个问题。 I already know that lm
is not allowed, so here is the rest of the story. 我已经知道lm
是不被允许的,所以这是故事的其余部分。
If you are allowed to use any other routines than lm
如果允许您使用除lm
任何其他例程
Then you can simply use lm.fit
, .lm.fit
or lsfit
for QR-based ordinary least squares solving. 然后你可以简单地使用lm.fit
, .lm.fit
或lsfit
进行基于QR的普通最小二乘求解。
lm.fit(X, y)
.lm.fit(X, y)
lsfit(X, y, intercept = FALSE)
Among those, .lm.fit
is the most light-weighed, while lm.fit
and lsfit
are pretty similar. 其中, .lm.fit
是最轻量级的,而lm.fit
和lsfit
非常相似。 The following is what we can do via .lm.fit
: 以下是我们可以通过.lm.fit
做的.lm.fit
:
f1 <- function (X, y) {
z <- .lm.fit(X, y)
RSS <- crossprod(z$residuals)[1]
TSS <- crossprod(y - mean(y))[1]
R2 <- 1 - RSS / TSS
list(coefficients = z$coefficients, residuals = z$residuals, R2 = R2)
}
In the question by your fellow classmate: Toy R function for solving ordinary least squares by singular value decomposition , I have already used this to check the correctness of SVD approach. 在你的同学的问题中: 玩具R函数通过奇异值分解求解普通最小二乘法 ,我已经用它来检查SVD方法的正确性。
If you are not allowed to use R's built-in QR factorization routine qr.default
如果您不允许使用R的内置QR分解例程qr.default
If .lm.fit
is not allowed, but qr.default
is, then it is also not that complicated. 如果.lm.fit
是不允许的,但qr.default
是,那么它也没有那么复杂。
f2 <- function (X, y) {
## QR factorization `X = QR`
QR <- qr.default(X)
## After rotation of `X` and `y`, solve upper triangular system `Rb = Q'y`
b <- backsolve(QR$qr, qr.qty(QR, y))
## residuals
e <- as.numeric(y - X %*% b)
## R-squared
RSS <- crossprod(e)[1]
TSS <- crossprod(y - mean(y))[1]
R2 <- 1 - RSS / TSS
## multiple return
list(coefficients = b, residuals = e, R2 = R2)
}
If you further want variance-covariance of estimated coefficients, follow How to calculate variance of least squares estimator using QR decomposition in R? 如果您还需要估计系数的方差 - 协方差,请按照如何使用R中的QR分解计算最小二乘估计的方差? . 。
If you are not even allowed to use qr.default
如果您甚至不允许使用qr.default
Then we have to write QR decomposition ourselves. 然后我们必须自己编写QR分解。 Writing a Householder QR factorization function in R code is giving this. 在R代码中编写Householder QR分解函数就是这样。
Using the function myqr
there, we can write 在那里使用myqr
函数,我们可以写
f3 <- function (X, y) {
## our own QR factorization
## complete Q factor is not required
QR <- myqr(X, complete = FALSE)
Q <- QR$Q
R <- QR$R
## rotation of `y`
Qty <- as.numeric(crossprod(Q, y))
## solving upper triangular system
b <- backsolve(R, Qty)
## residuals
e <- as.numeric(y - X %*% b)
## R-squared
RSS <- crossprod(e)[1]
TSS <- crossprod(y - mean(y))[1]
R2 <- 1 - RSS / TSS
## multiple return
list(coefficients = b, residuals = e, R2 = R2)
}
f3
is not extremely efficient, as we have formed Q
explicitly, even though it is the thin- Q
factor. f3
并不是非常有效,因为我们明确地形成了Q
,即使它是薄Q
因子。 In principle, we should rotate y
along with the QR factorization of X
, thus Q
needs not be formed. 原则上,我们应该将y
随着X
的QR因子分解而旋转,因此不需要形成Q
If you want to fix your existing code 如果要修复现有代码
This requires some debugging effort so would take some time. 这需要一些调试工作,因此需要一些时间。 I will make another answer regarding this later. 我稍后会就此作出另一个答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.