简体   繁体   English

从QR分解中获取帽子矩阵,用于加权最小二乘回归

[英]Get hat matrix from QR decomposition for weighted least square regression

I am trying to extend the lwr() function of the package McSptial , which fits weigthed regressions as non-parametric estimation. 我试图扩展McSptial软件包的McSptial lwr()函数,它适合作为非参数估计的McSptial回归。 In the core of the lwr() function, it inverts a matrix using solve() instead of a QR decomposition, resulting in numerical instability. lwr()函数的核心中,它使用solve()而不是QR分解来反转矩阵,从而导致数值不稳定。 I would like to change it but can't figure out how to get the hat matrix (or other derivatives) from the QR decomposition afterward. 我想改变它,但无法弄清楚如何从QR分解中获得帽子矩阵(或其他衍生物)。

With data : 有了数据:

set.seed(0); xmat <- matrix(rnorm(500), nrow=50)    ## model matrix
y <- rowSums(rep(2:11,each=50)*xmat)    ## arbitrary values to let `lm.wfit` work
w <- runif(50, 1, 2)    ## weights

The lwr() function goes as follows : lwr()函数如下:

xmat2 <- w * xmat
xx <- solve(crossprod(xmat, xmat2))
xmat1 <- tcrossprod(xx, xmat2)
vmat <- tcrossprod(xmat1)

I need the value of, for instance : 我需要的价值,例如:

sum((xmat[1,] %*% xmat1)^2)
sqrt(diag(vmat))

For the moment I use reg <- lm.wfit(x=xmat, y=y, w=w) but cannot manage to get back what seems to me to be the hat matrix ( xmat1 ) out of it. 目前我使用reg <- lm.wfit(x=xmat, y=y, w=w)但无法找回我认为是帽子矩阵( xmat1 )的东西。

This old question is a continuation of another old question I have just answered: Compute projection / hat matrix via QR factorization, SVD (and Cholesky factorization?) . 这个老问题是我刚才回答的另一个老问题的延续: 通过QR分解,SVD(和Cholesky分解?)计算投影/帽子矩阵 That answer discusses 3 options for computing hat matrix for an ordinary least square problem, while this question is under the context of weighted least squares. 该答案讨论了计算普通最小二乘问题的帽子矩阵的3个选项,而这个问题是在加权最小二乘的背景下。 But result and method in that answer will be the basis of my answer here. 但是答案中的结果和方法将成为我答案的基础。 Specifically, I will only demonstrate the QR approach. 具体来说,我只会演示QR方法。

在此输入图像描述

OP mentioned that we can use lm.wfit to compute QR factorization, but we could do so using qr.default ourselves, which is the way I will show. OP提到我们可以使用lm.wfit来计算QR分解,但是我们可以使用qr.defaultqr.default ,这就是我要展示的方式。


Before I proceed, I need point out that OP's code is not doing what he thinks. 在我继续之前,我需要指出OP的代码没有按照他的想法行事。 xmat1 is not hat matrix; xmat1不是帽子矩阵; instead, xmat %*% xmat1 is. 相反, xmat %*% xmat1是。 vmat is not hat matrix, although I don't know what it is really. vmat不是帽子矩阵,虽然我不知道它是什么。 Then I don't understand what these are: 然后我不明白这些是什么:

sum((xmat[1,] %*% xmat1)^2)
sqrt(diag(vmat))

The second one looks like the diagonal of hat matrix, but as I said, vmat is not hat matrix. 第二个看起来像帽子矩阵的对角线,但正如我所说, vmat不是帽子矩阵。 Well, anyway, I will proceed with the correct computation for hat matrix, and show how to get its diagonal and trace. 好吧,无论如何, 我将继续正确计算帽子矩阵,并展示如何获得它的对角线和轨迹。


Consider a toy model matrix X and some uniform, positive weights w : 考虑玩具模型矩阵X和一些均匀的正重量w

set.seed(0); X <- matrix(rnorm(500), nrow = 50)
w <- runif(50, 1, 2)    ## weights must be positive
rw <- sqrt(w)    ## square root of weights

We first obtain X1 (X_tilde in the latex paragraph) by row rescaling to X : 我们首先通过行重新缩放到X获得X1 (乳胶段落中的X_tilde):

X1 <- rw * X

Then we perform QR factorization to X1 . 然后我们对X1进行QR分解。 As discussed in my linked answer, we can do this factorization with or without column pivoting. 正如在我的链接答案中所讨论的,我们可以使用或不使用列旋转来执行此分解。 lm.fit or lm.wfit hence lm is not doing pivoting, but here I will use pivoted factorization as a demonstration. lm.fitlm.wfit因此lm没有进行旋转,但在这里我将使用旋转分解作为演示。

QR <- qr.default(X1, LAPACK = TRUE)
Q <- qr.qy(QR, diag(1, nrow = nrow(QR$qr), ncol = QR$rank))

Note we did not go on computing tcrossprod(Q) as in the linked answer, because that is for ordinary least squares. 注意我们没有像链接的答案那样继续计算tcrossprod(Q) ,因为那是普通的最小二乘法。 For weighted least squares, we want Q1 and Q2 : 对于加权最小二乘法,我们需要Q1Q2

Q1 <- (1 / rw) * Q
Q2 <- rw * Q

If we only want the diagonal and trace of the hat matrix, there is no need to do a matrix multiplication to first get the full hat matrix. 如果我们只想要帽子矩阵的对角线和轨迹,则不需要进行矩阵乘法来首先获得完整的帽子矩阵。 We can use 我们可以用

d <- rowSums(Q1 * Q2)  ## diagonal
# [1] 0.20597777 0.26700833 0.30503459 0.30633288 0.22246789 0.27171651
# [7] 0.06649743 0.20170817 0.16522568 0.39758645 0.17464352 0.16496177
#[13] 0.34872929 0.20523690 0.22071444 0.24328554 0.32374295 0.17190937
#[19] 0.12124379 0.18590593 0.13227048 0.10935003 0.09495233 0.08295841
#[25] 0.22041164 0.18057077 0.24191875 0.26059064 0.16263735 0.24078776
#[31] 0.29575555 0.16053372 0.11833039 0.08597747 0.14431659 0.21979791
#[37] 0.16392561 0.26856497 0.26675058 0.13254903 0.26514759 0.18343306
#[43] 0.20267675 0.12329997 0.30294287 0.18650840 0.17514183 0.21875637
#[49] 0.05702440 0.13218959

edf <- sum(d)  ## trace, sum of diagonals
# [1] 10

In linear regression, d is the influence of each datum, and it is useful for producing confidence interval (using sqrt(d) ) and standardised residuals (using sqrt(1 - d) ). 在线性回归中, d是每个数据的影响,它对于产生置信区间(使用sqrt(d) )和标准化残差(使用sqrt(1 - d) )很有用。 The trace, is the effective number of parameters or effective degree of freedom for the model (hence I call it edf ). 跟踪,是模型的有效参数数量或有效自由度(因此我称之为edf )。 We see that edf = 10 , because we have used 10 parameters: X has 10 columns and it is not rank-deficient. 我们看到edf = 10 ,因为我们使用了10个参数: X有10列,并且它没有秩缺陷。

Usually d and edf are all we need. 通常dedf都是我们需要的。 In rare cases do we want a full hat matrix. 在极少数情况下,我们需要一个完整的帽子矩阵。 To get it, we need an expensive matrix multiplication: 为了得到它,我们需要一个昂贵的矩阵乘法:

H <- tcrossprod(Q1, Q2)

Hat matrix is particularly useful in helping us understand whether a model is local / sparse of not. Hat矩阵对于帮助我们了解模型是否为局部/稀疏特别有用。 Let's plot this matrix (read ?image for details and examples on how to plot a matrix in the correct orientation): 让我们绘制这个矩阵(读取?image以获取有关如何以正确方向绘制矩阵的详细信息和示例):

image(t(H)[ncol(H):1,])

在此输入图像描述

We see that this matrix is completely dense . 我们看到这个矩阵是完全密集的 This means, prediction at each datum depends on all data, ie, prediction is not local. 这意味着,每个数据的预测取决于所有数据,即预测不是本地的。 While if we compare with other non-parametric prediction methods, like kernel regression, loess, P-spline (penalized B-spline regression) and wavelet, we will observe a sparse hat matrix. 如果我们与其他非参数预测方法比较,如核回归,黄土,P样条(惩罚B样条回归)和小波,我们将观察稀疏的帽子矩阵。 Therefore, these methods are known as local fitting. 因此,这些方法称为局部拟合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM