[英]Get hat matrix from QR decomposition for weighted least square regression
I am trying to extend the lwr()
function of the package McSptial
, which fits weigthed regressions as non-parametric estimation. 我试图扩展
McSptial
软件包的McSptial
lwr()
函数,它适合作为非参数估计的McSptial
回归。 In the core of the lwr()
function, it inverts a matrix using solve()
instead of a QR decomposition, resulting in numerical instability. 在
lwr()
函数的核心中,它使用solve()
而不是QR分解来反转矩阵,从而导致数值不稳定。 I would like to change it but can't figure out how to get the hat matrix (or other derivatives) from the QR decomposition afterward. 我想改变它,但无法弄清楚如何从QR分解中获得帽子矩阵(或其他衍生物)。
With data : 有了数据:
set.seed(0); xmat <- matrix(rnorm(500), nrow=50) ## model matrix
y <- rowSums(rep(2:11,each=50)*xmat) ## arbitrary values to let `lm.wfit` work
w <- runif(50, 1, 2) ## weights
The lwr()
function goes as follows : lwr()
函数如下:
xmat2 <- w * xmat
xx <- solve(crossprod(xmat, xmat2))
xmat1 <- tcrossprod(xx, xmat2)
vmat <- tcrossprod(xmat1)
I need the value of, for instance : 我需要的价值,例如:
sum((xmat[1,] %*% xmat1)^2)
sqrt(diag(vmat))
For the moment I use reg <- lm.wfit(x=xmat, y=y, w=w)
but cannot manage to get back what seems to me to be the hat matrix ( xmat1
) out of it. 目前我使用
reg <- lm.wfit(x=xmat, y=y, w=w)
但无法找回我认为是帽子矩阵( xmat1
)的东西。
This old question is a continuation of another old question I have just answered: Compute projection / hat matrix via QR factorization, SVD (and Cholesky factorization?) . 这个老问题是我刚才回答的另一个老问题的延续: 通过QR分解,SVD(和Cholesky分解?)计算投影/帽子矩阵 。 That answer discusses 3 options for computing hat matrix for an ordinary least square problem, while this question is under the context of weighted least squares.
该答案讨论了计算普通最小二乘问题的帽子矩阵的3个选项,而这个问题是在加权最小二乘的背景下。 But result and method in that answer will be the basis of my answer here.
但是答案中的结果和方法将成为我答案的基础。 Specifically, I will only demonstrate the QR approach.
具体来说,我只会演示QR方法。
OP mentioned that we can use lm.wfit
to compute QR factorization, but we could do so using qr.default
ourselves, which is the way I will show. OP提到我们可以使用
lm.wfit
来计算QR分解,但是我们可以使用qr.default
来qr.default
,这就是我要展示的方式。
Before I proceed, I need point out that OP's code is not doing what he thinks. 在我继续之前,我需要指出OP的代码没有按照他的想法行事。
xmat1
is not hat matrix; xmat1
不是帽子矩阵; instead, xmat %*% xmat1
is. 相反,
xmat %*% xmat1
是。 vmat
is not hat matrix, although I don't know what it is really. vmat
不是帽子矩阵,虽然我不知道它是什么。 Then I don't understand what these are: 然后我不明白这些是什么:
sum((xmat[1,] %*% xmat1)^2)
sqrt(diag(vmat))
The second one looks like the diagonal of hat matrix, but as I said, vmat
is not hat matrix. 第二个看起来像帽子矩阵的对角线,但正如我所说,
vmat
不是帽子矩阵。 Well, anyway, I will proceed with the correct computation for hat matrix, and show how to get its diagonal and trace. 好吧,无论如何, 我将继续正确计算帽子矩阵,并展示如何获得它的对角线和轨迹。
Consider a toy model matrix X
and some uniform, positive weights w
: 考虑玩具模型矩阵
X
和一些均匀的正重量w
:
set.seed(0); X <- matrix(rnorm(500), nrow = 50)
w <- runif(50, 1, 2) ## weights must be positive
rw <- sqrt(w) ## square root of weights
We first obtain X1
(X_tilde in the latex paragraph) by row rescaling to X
: 我们首先通过行重新缩放到
X
获得X1
(乳胶段落中的X_tilde):
X1 <- rw * X
Then we perform QR factorization to X1
. 然后我们对
X1
进行QR分解。 As discussed in my linked answer, we can do this factorization with or without column pivoting. 正如在我的链接答案中所讨论的,我们可以使用或不使用列旋转来执行此分解。
lm.fit
or lm.wfit
hence lm
is not doing pivoting, but here I will use pivoted factorization as a demonstration. lm.fit
或lm.wfit
因此lm
没有进行旋转,但在这里我将使用旋转分解作为演示。
QR <- qr.default(X1, LAPACK = TRUE)
Q <- qr.qy(QR, diag(1, nrow = nrow(QR$qr), ncol = QR$rank))
Note we did not go on computing tcrossprod(Q)
as in the linked answer, because that is for ordinary least squares. 注意我们没有像链接的答案那样继续计算
tcrossprod(Q)
,因为那是普通的最小二乘法。 For weighted least squares, we want Q1
and Q2
: 对于加权最小二乘法,我们需要
Q1
和Q2
:
Q1 <- (1 / rw) * Q
Q2 <- rw * Q
If we only want the diagonal and trace of the hat matrix, there is no need to do a matrix multiplication to first get the full hat matrix. 如果我们只想要帽子矩阵的对角线和轨迹,则不需要进行矩阵乘法来首先获得完整的帽子矩阵。 We can use
我们可以用
d <- rowSums(Q1 * Q2) ## diagonal
# [1] 0.20597777 0.26700833 0.30503459 0.30633288 0.22246789 0.27171651
# [7] 0.06649743 0.20170817 0.16522568 0.39758645 0.17464352 0.16496177
#[13] 0.34872929 0.20523690 0.22071444 0.24328554 0.32374295 0.17190937
#[19] 0.12124379 0.18590593 0.13227048 0.10935003 0.09495233 0.08295841
#[25] 0.22041164 0.18057077 0.24191875 0.26059064 0.16263735 0.24078776
#[31] 0.29575555 0.16053372 0.11833039 0.08597747 0.14431659 0.21979791
#[37] 0.16392561 0.26856497 0.26675058 0.13254903 0.26514759 0.18343306
#[43] 0.20267675 0.12329997 0.30294287 0.18650840 0.17514183 0.21875637
#[49] 0.05702440 0.13218959
edf <- sum(d) ## trace, sum of diagonals
# [1] 10
In linear regression, d
is the influence of each datum, and it is useful for producing confidence interval (using sqrt(d)
) and standardised residuals (using sqrt(1 - d)
). 在线性回归中,
d
是每个数据的影响,它对于产生置信区间(使用sqrt(d)
)和标准化残差(使用sqrt(1 - d)
)很有用。 The trace, is the effective number of parameters or effective degree of freedom for the model (hence I call it edf
). 跟踪,是模型的有效参数数量或有效自由度(因此我称之为
edf
)。 We see that edf = 10
, because we have used 10 parameters: X
has 10 columns and it is not rank-deficient. 我们看到
edf = 10
,因为我们使用了10个参数: X
有10列,并且它没有秩缺陷。
Usually d
and edf
are all we need. 通常
d
和edf
都是我们需要的。 In rare cases do we want a full hat matrix. 在极少数情况下,我们需要一个完整的帽子矩阵。 To get it, we need an expensive matrix multiplication:
为了得到它,我们需要一个昂贵的矩阵乘法:
H <- tcrossprod(Q1, Q2)
Hat matrix is particularly useful in helping us understand whether a model is local / sparse of not. Hat矩阵对于帮助我们了解模型是否为局部/稀疏特别有用。 Let's plot this matrix (read
?image
for details and examples on how to plot a matrix in the correct orientation): 让我们绘制这个矩阵(读取
?image
以获取有关如何以正确方向绘制矩阵的详细信息和示例):
image(t(H)[ncol(H):1,])
We see that this matrix is completely dense . 我们看到这个矩阵是完全密集的 。 This means, prediction at each datum depends on all data, ie, prediction is not local.
这意味着,每个数据的预测取决于所有数据,即预测不是本地的。 While if we compare with other non-parametric prediction methods, like kernel regression, loess, P-spline (penalized B-spline regression) and wavelet, we will observe a sparse hat matrix.
如果我们与其他非参数预测方法比较,如核回归,黄土,P样条(惩罚B样条回归)和小波,我们将观察稀疏的帽子矩阵。 Therefore, these methods are known as local fitting.
因此,这些方法称为局部拟合。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.