简体   繁体   English

对 `qr()` 的信心动摇了

[英]Shaken faith in `qr()`

I have relied on the qr() function a lot in dealing with rank-deficient situations, but have recently run into some examples where it doesn't work correctly.我在处理排名不足的情况时非常依赖qr()函数,但最近遇到了一些无法正常工作的示例。 Consider the matrix badX below:考虑下面的矩阵badX

badX <-
structure(c(-1.641906809157e-10, 0, 0, 0, 0, -0.5, 0, 0, -1.10482935525559e-16, 
            0, -3.06266685765538e-17, 0, -4.83736007092039e-17, 0, -3.14414492582296e-18, 
            -3.06158275836099e-18), dim = c(4L, 4L), dimnames = list(c("(Intercept)", 
            "A2", "A3", "B2"), NULL))

We cannot invert this matrix using the solve() :我们不能使用solve()反转这个矩阵:

solve(badX)
## Error in solve.default(badX): system is computationally singular: reciprocal condition number = 5.55308e-18

Yet qr() and its associated routines thinks this matrix has a rank of 4 and it can invert it:然而qr()及其相关例程认为该矩阵的秩为 4 并且可以反转它:

qr(badX)$rank
## [1] 4

qr.solve(badX)
##             [,1] [,2]          [,3]          [,4]
## [1,] -6090479645    0  2.197085e+10  7.366741e+10
## [2,]           0   -2  0.000000e+00  0.000000e+00
## [3,]           0    0 -3.265128e+16  3.353179e+16
## [4,]           0    0  0.000000e+00 -3.266284e+17

This is a pretty ugly result.这是一个非常丑陋的结果。 I have tried varying the tol argument, with no change in results.我尝试改变tol参数,结果没有变化。

For context, the origin of this result is this contrast matrix:对于上下文,这个结果的来源是这个对比矩阵:

badL <-
structure(c(0, 0, 0, 0, 0, -9.89189274870351e-11, 0, -5.55111512312578e-17, 
    -2.77555756156289e-17, 1.11022302462516e-16, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.25, 0, 0, 0, 0, -0.25, 0, 0, 
    0, 9.89189274870351e-11, 0, 5.55111512312578e-17, 2.77555756156289e-17, 
    -1.11022302462516e-16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, -4.23939184015843e-11, 0, -4.16333634234434e-17, -1.38777878078145e-17, 
    5.55111512312578e-17, 0, 0, 0, 0, 0, -4.23939184015843e-11, 0, 
    -4.16333634234434e-17, -1.38777878078145e-17, 5.55111512312578e-17, 
    0, 0, 0, 0, 0, 0, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 0, 0, 
    0, 0, 0, 0, 0, 0, 4.23939184015843e-11, 0, 4.16333634234434e-17, 
    1.38777878078145e-17, -5.55111512312578e-17, 0, 0, 0, 0, 0, -1.41313127284714e-11, 
    0, -6.93889390390723e-18, -6.93889390390723e-18, 1.38777878078145e-17, 
    4.23939184015843e-11, 0, 4.16333634234434e-17, 1.38777878078145e-17, 
    -5.55111512312578e-17, 0, 0, 0, 0, 0), dim = c(5L, 24L), dimnames = list(
    NULL, c("(Intercept)", "A2", "A3", "B2", "B3", "C2", "C3", 
    "A2:B2", "A3:B2", "A2:B3", "A3:B3", "A2:C2", "A3:C2", "A2:C3", 
    "A3:C3", "B2:C2", "B3:C2", "B2:C3", "B3:C3", "A2:B2:C2", 
    "A3:B2:C2", "A3:B3:C2", "A2:B2:C3", "A3:B2:C3")))

... from which I obtained the QR decomposition of its transpose, to find that it is supposedly of rank 4: ...我从中获得了其转置的 QR 分解,发现它应该是 4 级:

badQR <- qr(t(badL))
badQR$rank
## [1] 4

The above matrix badX is equal to qr.R(badQR)[1:4, 1:4] which based on the rank calculation, was supposed to be a full-rank upper-triangular matrix.上面的矩阵badX等于qr.R(badQR)[1:4, 1:4]基于秩计算,应该是一个全秩上三角矩阵。

My remedy seems to be to use zapsmall() so that I get the rank right...我的补救措施似乎是使用zapsmall()以便我获得正确的排名......

qr(zapsmall(t(badL)))$rank
## [1] 1

My question is, why does this happen?我的问题是,为什么会发生这种情况? If you look at badL , it's pretty clear that it has three zero rows and only the second row is nonzero.如果你看badL ,很明显它有三个零行,只有第二行是非零的。 I would have thought that qr() 's pivoting methods would work better with this.我原以为qr()的旋转方法会更好地处理这个问题。 Is there a better way to get more reliable code?有没有更好的方法来获得更可靠的代码?

I am running Windows 11 Pro, version 10.0.22000 build 22000. Here's my R system information.我正在运行 Windows 11 Pro,版本 10.0.22000 build 22000。这是我的 R 系统信息。

R.Version()
## $platform
## [1] "x86_64-w64-mingw32"
## 
## $arch
## [1] "x86_64"
## 
## $os
## [1] "mingw32"
## 
## $crt
## [1] "ucrt"
## 
## $system
## [1] "x86_64, mingw32"
## 
## $status
## [1] ""
## 
## $major
## [1] "4"
## 
## $minor
## [1] "2.0"
## 
## $year
## [1] "2022"
## 
## $month
## [1] "04"
## 
## $day
## [1] "22"
## 
## $`svn rev`
## [1] "82229"
## 
## $language
## [1] "R"
## 
## $version.string
## [1] "R version 4.2.0 (2022-04-22 ucrt)"
## 
## $nickname
## [1] "Vigorous Calisthenics"

Created on 2022-06-21 by the reprex package (v2.0.1)reprex 包于 2022-06-21 创建 (v2.0.1)

More on context更多关于上下文

This question came up because I was trying to produce results like this (for a simpler example) in the emmeans package:出现这个问题是因为我试图在emmeans包中产生这样的结果(对于一个更简单的例子):

> (jt = joint_tests(warpx.emm))
 model term   df1 df2 F.ratio p.value note
 tension        1  37   5.741  0.0217    e
 wool:tension   1  37   5.867  0.0204    e
 (confounded)   2  37   7.008  0.0026  d e

d: df1 reduced due to linear dependence 
e: df1 reduced due to non-estimability

... and in particular, the (confounded) part. ...尤其是(confounded)部分。 This example is with a two-factor model with wool at 2 levels and tension at 3 levels;这个例子是一个双因素模型, wool在 2 个水平, tension在 3 个水平; however, one of the factor combinations is omitted in the data, which means that we can estimate only 1 df for each of the tension main effect and the wool:tension interaction effect, and no main effect at all for wool .但是,数据中省略了一个因子组合,这意味着我们只能估计tension主效应和wool:tension相互作用效应中的每一个的 1 df,而wool根本没有主效应。 There being 4 df for all possible contrasts of the 5 nonempty cells, there are 2 df left over, and those are in the confounded) part. 5 个非空单元格的所有可能对比有 4 个 df,剩下 2 个 df,这些在confounded)部分。

The computation is based on the associated estimable functions:计算基于相关的可估计函数:

> attr(jt, "est.fcns")
$tension
     (Intercept) woolB tensionM tensionH woolB:tensionM woolB:tensionH
[1,]           0     0        1        0            0.5              0

$`wool:tension`
     (Intercept) woolB tensionM tensionH woolB:tensionM woolB:tensionH
[1,]           0     0        0        0              1              0

$`(confounded)`
     (Intercept) woolB tensionM tensionH woolB:tensionM woolB:tensionH
[1,]           0    -1        0        0              0              0
[2,]           0     1        0        0              0              0
[3,]           0    -1        0        0              0              0
[4,]           0    -1        0        1              0              0

... and on the contrasts among all cells in the design: ...以及设计中所有单元格之间的对比:

> contrast(warpx.emm, "consec")@linfct
     (Intercept) woolB tensionM tensionH woolB:tensionM woolB:tensionH
[1,]           0     1        0        0              0              0
[2,]           0    -1        1        0              0              0
[3,]           0     1        0        0              1              0
[4,]           0    -1       -1        1             -1              0
[5,]           0     1        0        0              0              1

The method I use is to combine the estimable functions for tension and wool:tension , and obtain the QR decomposition of its transpose.我使用的方法是结合tensionwool:tension ,并获得其转置的 QR 分解。 Then I use qr.resid() with that and the transpose of the above cell contrasts.然后我使用qr.resid()和上述单元格的转置对比。 That leaves us (after transposing yet again) with the estimable functions shown for (confounded) .这给我们留下了(再次转置之后)为(confounded)显示的可估计函数。 That matrix has 4 rows, but its rank is only 2, as determined by the QR decomposition of this result;该矩阵有 4 行,但其秩仅为 2,由该结果的 QR 分解确定; then I extract the 2x2 portion of the R part to complete the computation of the F statistic.然后我提取 R 部分的 2x2 部分以完成F统计量的计算。

The example at the beginning of this question is similar, but with a larger, more complex model;这个问题开头的例子类似,但模型更大、更复杂; the badL matrix is the result of the qr.resid() procedure described above. badL矩阵是上述qr.resid()过程的结果。 In this context, some of those rows arguably should be zero.在这种情况下,其中一些行可以说应该为零。 My workaround at present is to examine the diagonal elements of R ( badR in the OP) and select those that exceed an absolute threshold.我目前的解决方法是检查 R 的对角线元素(OP 中的badR )并选择那些超过绝对阈值的元素。

The essential idea here is that I need to decompose that matrix of all contrasts into two parts -- the known estimable functions and the leftovers.这里的基本思想是我需要将所有对比的矩阵分解为两部分——已知的可估计函数和剩余部分。 And an interesting aspect of this is that the rank of the latter part is known , a fact that I have not taken advantage of.而这其中一个有趣的方面是,后半部分的排名是已知的,这是我没有利用的一个事实。 In future development, it may well be, per @duffymo, to use a SVD rather than these gyrations with qr.resid() .在未来的发展中,根据@duffymo 的说法,很可能使用 SVD 而不是这些带有qr.resid()的回转。 There's always new stuff to learn...总有新东西要学...

You are complaining that solve can not invert a matrix that seems to be full rank (according to qr ).您抱怨solve无法反转似乎是满秩的矩阵(根据qr )。 And you think that solve is doing the correct thing, while qr is not.你认为solve是做正确的事情,而qr不是。

Well, do not trust solve .好吧,不要相信solve It is not a robust numerical procedure and we can fool it easily.它不是一个健壮的数值程序,我们可以很容易地欺骗它。 Here is a diagonal matrix.这是一个对角矩阵。 It is certainly invertible (by simply inverting its diagonal elements), but solve just can't do it.它当然是可逆的(通过简单地反转它的对角元素),但solve做不到。

D <- diag(c(1, 1e-20))
#     [,1]  [,2]
#[1,]    1 0e+00
#[2,]    0 1e-20

solve(D)
#Error in solve.default(D) : 
#  system is computationally singular: reciprocal condition number = 1e-20

Dinv <- diag(c(1, 1e+20))

## an identity matrix, as expected
D %*% Dinv
#     [,1] [,2]
#[1,]    1    0
#[2,]    0    1

## an identity matrix, as expected
Dinv %*% D
#     [,1] [,2]
#[1,]    1    0
#[2,]    0    1

Now let's look at your badX , which I call R (as it is the upper triangular matrix returned by QR factorization).现在让我们看看你的badX ,我称之为R (因为它是 QR 分解返回的上三角矩阵)。

R <-
structure(c(-1.641906809157e-10, 0, 0, 0, 0, -0.5, 0, 0, -1.10482935525559e-16, 
            0, -3.06266685765538e-17, 0, -4.83736007092039e-17, 0, -3.14414492582296e-18, 
            -3.06158275836099e-18), dim = c(4L, 4L))

solve can not invert it, but qr.solve gives you a proper inverse matrix. solve不能反转它,但是qr.solve给你一个适当的逆矩阵。

Rinv <- qr.solve(R)

## an identity matrix, as expected
R %*% Rinv
#     [,1] [,2] [,3]         [,4]
#[1,]    1    0    0 1.776357e-15
#[2,]    0    1    0 0.000000e+00
#[3,]    0    0    1 0.000000e+00
#[4,]    0    0    0 1.000000e+00

## an identity matrix, as expected
Rinv %*% R
#     [,1] [,2] [,3]         [,4]
#[1,]    1    0    0 5.293956e-23
#[2,]    0    1    0 0.000000e+00
#[3,]    0    0    1 1.387779e-17
#[4,]    0    0    0 1.000000e+00

QR factorization is numerically stable, by being less sensitive to the scale (or size, magnitude) of different columns. QR 分解在数值上是稳定的,因为它对不同列的比例(或大小、大小)不太敏感。 ( Other factorizations like LU (on which solve is based) and SVD do. ) By definition, this factorization does 其他分解,如 LU( solve所基于)和 SVD。 )根据定义,这种分解确实

X = QR X = QR

If we re-scale X 's columns by right multiplying a full-rank diagonal matrix D , the QR factorization does not change.如果我们通过右乘满秩对角矩阵D来重新缩放X的列,则 QR 分解不会改变。

XD = QRD XD = QRD

So let's look at your big matrix t(badL) to which you apply the QR factorization.因此,让我们看看您应用 QR 分解的大矩阵t(badL) I call it X .我称之为X

X <- structure(c(0, -9.89189274870351e-11, 0, 0, 0, 0, 0, 9.89189274870351e-11, 
0, 0, 0, -4.23939184015843e-11, 0, -4.23939184015843e-11, 0, 
0, 0, 0, 0, 4.23939184015843e-11, 0, -1.41313127284714e-11, 4.23939184015843e-11, 
0, 0, 0, 0, 0, 0, -0.25, -0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 
0, 0.25, 0, 0, 0, 0, 0, 0, 0, -5.55111512312578e-17, 0, 0, 0, 
0, 0, 5.55111512312578e-17, 0, 0, 0, -4.16333634234434e-17, 0, 
-4.16333634234434e-17, 0, 0, 0, 0, 0, 4.16333634234434e-17, 0, 
-6.93889390390723e-18, 4.16333634234434e-17, 0, 0, -2.77555756156289e-17, 
0, 0, 0, 0, 0, 2.77555756156289e-17, 0, 0, 0, -1.38777878078145e-17, 
0, -1.38777878078145e-17, 0, 0, 0, 0, 0, 1.38777878078145e-17, 
0, -6.93889390390723e-18, 1.38777878078145e-17, 0, 0, 1.11022302462516e-16, 
0, 0, 0, 0, 0, -1.11022302462516e-16, 0, 0, 0, 5.55111512312578e-17, 
0, 5.55111512312578e-17, 0, 0, 0, 0, 0, -5.55111512312578e-17, 
0, 1.38777878078145e-17, -5.55111512312578e-17, 0), dim = c(24L, 
5L))
#               [,1]  [,2]          [,3]          [,4]          [,5]
# [1,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
# [2,] -9.891893e-11  0.00 -5.551115e-17 -2.775558e-17  1.110223e-16
# [3,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
# [4,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
# [5,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
# [6,]  0.000000e+00 -0.25  0.000000e+00  0.000000e+00  0.000000e+00
# [7,]  0.000000e+00 -0.25  0.000000e+00  0.000000e+00  0.000000e+00
# [8,]  9.891893e-11  0.00  5.551115e-17  2.775558e-17 -1.110223e-16
# [9,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
#[10,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
#[11,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
#[12,] -4.239392e-11  0.00 -4.163336e-17 -1.387779e-17  5.551115e-17
#[13,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
#[14,] -4.239392e-11  0.00 -4.163336e-17 -1.387779e-17  5.551115e-17
#[15,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
#[16,]  0.000000e+00  0.25  0.000000e+00  0.000000e+00  0.000000e+00
#[17,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
#[18,]  0.000000e+00  0.25  0.000000e+00  0.000000e+00  0.000000e+00
#[19,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
#[20,]  4.239392e-11  0.00  4.163336e-17  1.387779e-17 -5.551115e-17
#[21,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00
#[22,] -1.413131e-11  0.00 -6.938894e-18 -6.938894e-18  1.387779e-17
#[23,]  4.239392e-11  0.00  4.163336e-17  1.387779e-17 -5.551115e-17
#[24,]  0.000000e+00  0.00  0.000000e+00  0.000000e+00  0.000000e+00

Let's re-scale its columns so that every column has Euclidean norm (L2 norm, 2-norm) 1.让我们重新缩放它的列,使每一列都有欧几里得范数(L2 norm,2-norm)1。

norm2 <- sqrt(colSums(X ^ 2))

XD <- X * rep(1 / norm2, each = nrow(X))
#             [,1] [,2]        [,3]       [,4]        [,5]
# [1,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
# [2,] -0.60246371  0.0 -0.48418203 -0.5714286  0.57585260
# [3,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
# [4,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
# [5,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
# [6,]  0.00000000 -0.5  0.00000000  0.0000000  0.00000000
# [7,]  0.00000000 -0.5  0.00000000  0.0000000  0.00000000
# [8,]  0.60246371  0.0  0.48418203  0.5714286 -0.57585260
# [9,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
#[10,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
#[11,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
#[12,] -0.25819930  0.0 -0.36313652 -0.2857143  0.28792630
#[13,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
#[14,] -0.25819930  0.0 -0.36313652 -0.2857143  0.28792630
#[15,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
#[16,]  0.00000000  0.5  0.00000000  0.0000000  0.00000000
#[17,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
#[18,]  0.00000000  0.5  0.00000000  0.0000000  0.00000000
#[19,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
#[20,]  0.25819930  0.0  0.36313652  0.2857143 -0.28792630
#[21,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000
#[22,] -0.08606647  0.0 -0.06052275 -0.1428571  0.07198158
#[23,]  0.25819930  0.0  0.36313652  0.2857143 -0.28792630
#[24,]  0.00000000  0.0  0.00000000  0.0000000  0.00000000

What do you think now?你现在怎么想? Is it still a matrix with only one nonzero column ?它仍然是一个只有一个非零列的矩阵吗? Although qr(X) does not actually first re-scale all columns before QR factorization, looking at XD does help you better understand why QR factorization is more robust.尽管qr(X)实际上并没有在 QR 分解之前首先重新缩放所有列,但查看XD确实可以帮助您更好地理解为什么 QR 分解更健壮。

If you do want to intervene, do not use zapsmall ;如果您确实想干预,请不要使用zapsmall threshold columns by their 2-norm, instead.阈值列由它们的 2 范数代替。

X0 <- X
X0[, norm2 < max(norm2) * sqrt(.Machine$double.eps)] <- 0
QR0 <- qr(X0)

QR0$rank
# [1] 1

How do we know that sqrt(.Machine$double.eps) is an appropriate threshold?我们怎么知道sqrt(.Machine$double.eps)是一个合适的阈值?

Any threshold between sqrt(.Machine$double.eps) (about 1e-8) and .Machine$double.eps (about 1e-16) is reasonable. sqrt(.Machine$double.eps) (大约 1e-8)和.Machine$double.eps (about 1e-16)之间的任何阈值都是合理的。 Using .Machine$double.eps recovers the regular QR result, giving you rank 4.使用.Machine$double.eps恢复常规 QR 结果,为您提供排名 4。

The "sqrt" threshold comes from the situation where we want to look at X'X , which squares the condition number of X . “sqrt”阈值来自我们想要查看X'X的情况,它是X的条件数的平方。

I would suggest that you prefer Singular Value Decomposition .我建议您更喜欢Singular Value Decomposition It will give you the best solution possible if the matrix is rank deficient.如果矩阵秩不足,它将为您提供最佳解决方案。 Here's an example of how to use it in R.这是一个如何在 R 中使用它的示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM