简体   繁体   English

从`lm()`获取“mlm”对象的标准化残差和“残差与拟合”图

[英]Obtain standardised residuals and "Residual v.s. Fitted" plot for "mlm" object from `lm()`

set.seed(0)
## 2 response of 10 observations each
response <- matrix(rnorm(20), 10, 2)
## 3 covariates with 10 observations each
predictors <- matrix(rnorm(30), 10, 3)
fit <- lm(response ~ predictors)

I have been generating residual plots for the entire model using:我一直在使用以下方法为整个模型生成残差图:

plot(fitted(fit),residuals(fit))

However, I would like to make individual plots for each predictor covariate.但是,我想为每个预测变量协变量制作单独的图。 I can do them one at a time by:我可以通过以下方式一次做一个:

f <- fitted(fit)
r <- residual(fit)
plot(f[,1],r[,1])

The issue with this approach however, is that it needs to be generalizable for data sets with more predictor covariates.然而,这种方法的问题在于它需要可推广到具有更多预测协变量的数据集。 Is there a way that I use plot while iterating through each column of (f) and (r)?有没有办法在迭代(f)和(r)的每一列时使用 plot ? Or is there a way that plot() can group each co-variate by colour?或者有没有办法让plot()可以按颜色对每个协变量进行分组?

Make Sure you are using standardised residuals rather than raw residuals确保您使用的是标准化残差而不是原始残差

I often see plot(fitted(fit), residuals(fit)) but it is statistically wrong.我经常看到plot(fitted(fit), residuals(fit))但它在统计上是错误的。 We use plot(fit) to generate diagnostic plot, because we need standardised residuals rather than raw ones.我们使用plot(fit)来生成诊断图,因为我们需要标准化的残差而不是原始残差。 Read ?plot.lm for more.阅读?plot.lm了解更多信息。 But plot method for "mlm" is poorly supported:但是对“mlm”的plot方法的支持很差:

plot(fit)
# Error: 'plot.mlm' is not implemented yet

Define "rstandard" S3 method for "mlm"为“mlm”定义“rstandard”S3 方法

plot.mlm is not supported for many reasons, one of which is the lack of rstandard.mlm .不支持plot.mlm原因有很多,其中之一是缺少rstandard.mlm For "lm" and "glm" class, there is a generic S3 method "rstandard" to get standardised residuals:对于“lm”和“glm”类,有一个通用的S3方法“rstandard”来获得标准化残差:

methods(rstandard)
# [1] rstandard.glm* rstandard.lm*

There is no support for "mlm".不支持“传销”。 So we shall fill this gap first.所以我们要先填补这个空白。

It is not difficult to get standardised residuals.得到标准化残差并不难。 Let hii be diagonals of the hat matrix, the point-wise estimated standard error for residuals is sqrt(1 - hii) * sigma , where sigma = sqrt(RSS / df.residual) is estimated residual standard error.hii是帽子矩阵的对角线,残差的逐点估计标准误差为sqrt(1 - hii) * sigma ,其中sigma = sqrt(RSS / df.residual)是估计的残差标准误差。 RSS is residual sum of squares; RSS为残差平方和; df.residual is residual degree of freedom. df.residual是残差自由度。

hii can be computed from matrix factor Q of QR factorization of model matrix: rowSums(Q ^ 2) . hii可以从模型矩阵的 QR 分解的矩阵因子Q计算: rowSums(Q ^ 2) For "mlm", there is only one QR decomposition since the model matrix is the same for all responses, hence we only need to compute hii once.对于“mlm”,只有一个 QR 分解,因为所有响应的模型矩阵都相同,因此我们只需要计算一次hii

Different response has different sigma , but they are elegantly colSums(residuals(fit) ^ 2) / df.residual(fit) .不同的响应有不同的sigma ,但它们优雅地colSums(residuals(fit) ^ 2) / df.residual(fit)

Now, let's wrap up those ideas to get our own "rstandard" method for "mlm":现在,让我们总结一下这些想法,以获得我们自己的“mlm”的“rstandard”方法:

## define our own "rstandard" method for "mlm" class
rstandard.mlm <- function (model) {
  Q <- with(model, qr.qy(qr, diag(1, nrow = nrow(qr$qr), ncol = qr$rank)))  ## Q matrix
  hii <- rowSums(Q ^ 2)  ## diagonal of hat matrix QQ'
  RSS <- colSums(model$residuals ^ 2)  ## residual sums of squares (for each model)
  sigma <- sqrt(RSS / model$df.residual)  ##  ## Pearson estimate of residuals (for each model)
  pointwise_sd <- outer(sqrt(1 - hii), sigma)  ## point-wise residual standard error (for each model)
  model$residuals / pointwise_sd  ## standardised residuals
  }

Note the use of .mlm in function name to tell R this is S3 method associated.请注意在函数名称中使用.mlm来告诉 R 这是关联的 S3 方法。 Once we have defined this function, we can see it in "rstandard" method:一旦我们定义了这个函数,我们就可以在“rstandard”方法中看到它:

## now there are method for "mlm"
methods(rstandard)
# [1] rstandard.glm* rstandard.lm*  rstandard.mlm

To call this function, we don't have to explicitly call rstandard.mlm ;要调用此函数,我们不必显式调用rstandard.mlm calling rstandard is enough:调用rstandard就足够了:

## test with your fitted model `fit`
rstandard(fit)
#          [,1]       [,2]
#1   1.56221865  2.6593505
#2  -0.98791320 -1.9344546
#3   0.06042529 -0.4858276
#4   0.18713629  2.9814135
#5   0.11277397  1.4336484
#6  -0.74289985 -2.4452868
#7   0.03690363  0.7015916
#8  -1.58940448 -1.2850961
#9   0.38504435  1.3907223
#10  1.34618139 -1.5900891

Standardised residuals are N(0, 1) distributed.标准化残差是N(0, 1)分布的。


Getting residuals vs fitted plot for "mlm"获取“mlm”的残差与拟合图

Your initial try with:您的初步尝试:

f <- fitted(fit); r <- rstandard(fit); plot(f, r)

is not a bad idea, provided that dots for different models can be identified from each other.不是一个坏主意,前提是可以相互识别不同模型的点。 So we can try using different point colours for different models:所以我们可以尝试为不同的模型使用不同的点颜色:

plot(f, r, col = as.numeric(col(f)), pch = 19)

Graphical arguments like col , pch and cex can take vector input.colpchcex这样的图形参数可以采用向量输入。 I ask plot to use col = j for the r[,j] ~ f[,j] , where j = 1, 2,..., ncol(f) .我要求plotr[,j] ~ f[,j]使用col = j ,其中j = 1, 2,..., ncol(f) Read "Color Specification" of ?par for what col = j means.阅读?par “颜色规范”,了解col = j含义。 pch = 19 tells plot to draw solid dots. pch = 19告诉plot绘制实心点。 Read basic graphcial parameters for various choices.阅读各种选择的基本图形参数

Finally you may want a legend.最后,您可能想要一个传奇。 You can do你可以做

plot(f, r, col = as.numeric(col(f)), pch = 19, ylim = c(-3, 4))
legend("topleft", legend = paste0("response ", 1:ncol(f)), pch = 19,
       col = 1:ncol(f), text.col = 1:ncol(f))

In order to leave space for the legend box we extend ylim a little bit.为了给图例框留出空间,我们稍微扩展了ylim As standardised residuals are N(0,1) , ylim = c(-3, 3) is a good range.由于标准化残差为N(0,1) ,因此ylim = c(-3, 3)是一个很好的范围。 Should we want to place the legend box on the top left, we extend ylim to c(-3, 4) .如果我们想将图例框放在左上角,我们将ylim扩展到c(-3, 4) You can customize your legend even more via ncol , title , etc.您可以通过ncoltitle等更多地自定义您的图例。

在此处输入图片说明


How many responses do you have?你有多少回复?

If you have no more than a few responses, above suggestion works nicely.如果您的回复不多,则上述建议效果很好。 If you have plenty, plotting them in separate plot is suggested.如果你有很多,建议将它们绘制在单独的图中。 A for loop as you found out is decent, except that you need split plotting region into different subplots, possibly using par(mfrow = c(?, ?)) .您发现for循环是不错的,除了您需要将绘图区域拆分为不同的子图,可能使用par(mfrow = c(?, ?)) Also set inner margin mar and outer margin oma if you take this approach.如果您采用这种方法,还要设置内边缘mar和外边缘oma You may read How to produce a nicer plot for my categorical time series data in a matrix?您可以阅读如何为矩阵中的分类时间序列数据生成更好的图? for one example of doing this.举一个这样做的例子。

If you have even more responses, you might want a mixture of both?如果你有更多的回应,你可能想要两者的混合? Say if you have 42 responses, you can do par(mfrow = c(2, 3)) , then plot 7 responses in each subfigure.假设您有 42 个响应,您可以执行par(mfrow = c(2, 3)) ,然后在每个子图中绘制 7 个响应。 Now the solution is more opinion based.现在的解决方案更多的是基于意见。

This is how I solved it.我就是这样解决的。

for(i in 1:ncol(f)) {
    plot(f[,i],r[,i])
}

Mind blown.脑洞大开。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM