[英]Obtain standardised residuals and "Residual v.s. Fitted" plot for "mlm" object from `lm()`
set.seed(0)
## 2 response of 10 observations each
response <- matrix(rnorm(20), 10, 2)
## 3 covariates with 10 observations each
predictors <- matrix(rnorm(30), 10, 3)
fit <- lm(response ~ predictors)
I have been generating residual plots for the entire model using:我一直在使用以下方法为整个模型生成残差图:
plot(fitted(fit),residuals(fit))
However, I would like to make individual plots for each predictor covariate.但是,我想为每个预测变量协变量制作单独的图。 I can do them one at a time by:
我可以通过以下方式一次做一个:
f <- fitted(fit)
r <- residual(fit)
plot(f[,1],r[,1])
The issue with this approach however, is that it needs to be generalizable for data sets with more predictor covariates.然而,这种方法的问题在于它需要可推广到具有更多预测协变量的数据集。 Is there a way that I use plot while iterating through each column of (f) and (r)?
有没有办法在迭代(f)和(r)的每一列时使用 plot ? Or is there a way that
plot()
can group each co-variate by colour?或者有没有办法让
plot()
可以按颜色对每个协变量进行分组?
Make Sure you are using standardised residuals rather than raw residuals确保您使用的是标准化残差而不是原始残差
I often see plot(fitted(fit), residuals(fit))
but it is statistically wrong.我经常看到
plot(fitted(fit), residuals(fit))
但它在统计上是错误的。 We use plot(fit)
to generate diagnostic plot, because we need standardised residuals rather than raw ones.我们使用
plot(fit)
来生成诊断图,因为我们需要标准化的残差而不是原始残差。 Read ?plot.lm
for more.阅读
?plot.lm
了解更多信息。 But plot
method for "mlm" is poorly supported:但是对“mlm”的
plot
方法的支持很差:
plot(fit)
# Error: 'plot.mlm' is not implemented yet
Define "rstandard" S3 method for "mlm"为“mlm”定义“rstandard”S3 方法
plot.mlm
is not supported for many reasons, one of which is the lack of rstandard.mlm
.不支持
plot.mlm
原因有很多,其中之一是缺少rstandard.mlm
。 For "lm" and "glm" class, there is a generic S3 method "rstandard" to get standardised residuals:对于“lm”和“glm”类,有一个通用的S3方法“rstandard”来获得标准化残差:
methods(rstandard)
# [1] rstandard.glm* rstandard.lm*
There is no support for "mlm".不支持“传销”。 So we shall fill this gap first.
所以我们要先填补这个空白。
It is not difficult to get standardised residuals.得到标准化残差并不难。 Let
hii
be diagonals of the hat matrix, the point-wise estimated standard error for residuals is sqrt(1 - hii) * sigma
, where sigma = sqrt(RSS / df.residual)
is estimated residual standard error.设
hii
是帽子矩阵的对角线,残差的逐点估计标准误差为sqrt(1 - hii) * sigma
,其中sigma = sqrt(RSS / df.residual)
是估计的残差标准误差。 RSS
is residual sum of squares; RSS
为残差平方和; df.residual
is residual degree of freedom. df.residual
是残差自由度。
hii
can be computed from matrix factor Q
of QR factorization of model matrix: rowSums(Q ^ 2)
. hii
可以从模型矩阵的 QR 分解的矩阵因子Q
计算: rowSums(Q ^ 2)
。 For "mlm", there is only one QR decomposition since the model matrix is the same for all responses, hence we only need to compute hii
once.对于“mlm”,只有一个 QR 分解,因为所有响应的模型矩阵都相同,因此我们只需要计算一次
hii
。
Different response has different sigma
, but they are elegantly colSums(residuals(fit) ^ 2) / df.residual(fit)
.不同的响应有不同的
sigma
,但它们优雅地colSums(residuals(fit) ^ 2) / df.residual(fit)
。
Now, let's wrap up those ideas to get our own "rstandard" method for "mlm":现在,让我们总结一下这些想法,以获得我们自己的“mlm”的“rstandard”方法:
## define our own "rstandard" method for "mlm" class
rstandard.mlm <- function (model) {
Q <- with(model, qr.qy(qr, diag(1, nrow = nrow(qr$qr), ncol = qr$rank))) ## Q matrix
hii <- rowSums(Q ^ 2) ## diagonal of hat matrix QQ'
RSS <- colSums(model$residuals ^ 2) ## residual sums of squares (for each model)
sigma <- sqrt(RSS / model$df.residual) ## ## Pearson estimate of residuals (for each model)
pointwise_sd <- outer(sqrt(1 - hii), sigma) ## point-wise residual standard error (for each model)
model$residuals / pointwise_sd ## standardised residuals
}
Note the use of .mlm
in function name to tell R this is S3 method associated.请注意在函数名称中使用
.mlm
来告诉 R 这是关联的 S3 方法。 Once we have defined this function, we can see it in "rstandard" method:一旦我们定义了这个函数,我们就可以在“rstandard”方法中看到它:
## now there are method for "mlm"
methods(rstandard)
# [1] rstandard.glm* rstandard.lm* rstandard.mlm
To call this function, we don't have to explicitly call rstandard.mlm
;要调用此函数,我们不必显式调用
rstandard.mlm
; calling rstandard
is enough:调用
rstandard
就足够了:
## test with your fitted model `fit`
rstandard(fit)
# [,1] [,2]
#1 1.56221865 2.6593505
#2 -0.98791320 -1.9344546
#3 0.06042529 -0.4858276
#4 0.18713629 2.9814135
#5 0.11277397 1.4336484
#6 -0.74289985 -2.4452868
#7 0.03690363 0.7015916
#8 -1.58940448 -1.2850961
#9 0.38504435 1.3907223
#10 1.34618139 -1.5900891
Standardised residuals are N(0, 1)
distributed.标准化残差是
N(0, 1)
分布的。
Getting residuals vs fitted plot for "mlm"获取“mlm”的残差与拟合图
Your initial try with:您的初步尝试:
f <- fitted(fit); r <- rstandard(fit); plot(f, r)
is not a bad idea, provided that dots for different models can be identified from each other.不是一个坏主意,前提是可以相互识别不同模型的点。 So we can try using different point colours for different models:
所以我们可以尝试为不同的模型使用不同的点颜色:
plot(f, r, col = as.numeric(col(f)), pch = 19)
Graphical arguments like col
, pch
and cex
can take vector input.像
col
、 pch
和cex
这样的图形参数可以采用向量输入。 I ask plot
to use col = j
for the r[,j] ~ f[,j]
, where j = 1, 2,..., ncol(f)
.我要求
plot
对r[,j] ~ f[,j]
使用col = j
,其中j = 1, 2,..., ncol(f)
。 Read "Color Specification" of ?par
for what col = j
means.阅读
?par
“颜色规范”,了解col = j
含义。 pch = 19
tells plot
to draw solid dots. pch = 19
告诉plot
绘制实心点。 Read basic graphcial parameters for various choices.阅读各种选择的基本图形参数。
Finally you may want a legend.最后,您可能想要一个传奇。 You can do
你可以做
plot(f, r, col = as.numeric(col(f)), pch = 19, ylim = c(-3, 4))
legend("topleft", legend = paste0("response ", 1:ncol(f)), pch = 19,
col = 1:ncol(f), text.col = 1:ncol(f))
In order to leave space for the legend box we extend ylim
a little bit.为了给图例框留出空间,我们稍微扩展了
ylim
。 As standardised residuals are N(0,1)
, ylim = c(-3, 3)
is a good range.由于标准化残差为
N(0,1)
,因此ylim = c(-3, 3)
是一个很好的范围。 Should we want to place the legend box on the top left, we extend ylim
to c(-3, 4)
.如果我们想将图例框放在左上角,我们将
ylim
扩展到c(-3, 4)
。 You can customize your legend even more via ncol
, title
, etc.您可以通过
ncol
、 title
等更多地自定义您的图例。
How many responses do you have?你有多少回复?
If you have no more than a few responses, above suggestion works nicely.如果您的回复不多,则上述建议效果很好。 If you have plenty, plotting them in separate plot is suggested.
如果你有很多,建议将它们绘制在单独的图中。 A
for
loop as you found out is decent, except that you need split plotting region into different subplots, possibly using par(mfrow = c(?, ?))
.您发现
for
循环是不错的,除了您需要将绘图区域拆分为不同的子图,可能使用par(mfrow = c(?, ?))
。 Also set inner margin mar
and outer margin oma
if you take this approach.如果您采用这种方法,还要设置内边缘
mar
和外边缘oma
。 You may read How to produce a nicer plot for my categorical time series data in a matrix?您可以阅读如何为矩阵中的分类时间序列数据生成更好的图? for one example of doing this.
举一个这样做的例子。
If you have even more responses, you might want a mixture of both?如果你有更多的回应,你可能想要两者的混合? Say if you have 42 responses, you can do
par(mfrow = c(2, 3))
, then plot 7 responses in each subfigure.假设您有 42 个响应,您可以执行
par(mfrow = c(2, 3))
,然后在每个子图中绘制 7 个响应。 Now the solution is more opinion based.现在的解决方案更多的是基于意见。
This is how I solved it.我就是这样解决的。
for(i in 1:ncol(f)) {
plot(f[,i],r[,i])
}
Mind blown.脑洞大开。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.