[英]Residual variance extracted from glm and lmer in R
I am trying to take what I have read about multilevel modelling and merge it with what I know about glm
in R. I am now using the height growth data from here .我正在尝试将我所阅读的有关多级建模的内容与我对 R 中
glm
的了解合并。我现在正在使用此处的高度增长数据。
I have done some coding shown below:我做了一些如下所示的编码:
library(lme4)
library(ggplot2)
setwd("~/Documents/r_code/multilevel_modelling/")
rm(list=ls())
oxford.df <- read.fwf("oxboys/OXBOYS.DAT",widths=c(2,7,6,1))
names(oxford.df) <- c("stu_code","age_central","height","occasion_id")
oxford.df <- oxford.df[!is.na(oxford.df[,"age_central"]),]
oxford.df[,"stu_code"] <- factor(as.character(oxford.df[,"stu_code"]))
oxford.df[,"dummy"] <- 1
chart <- ggplot(data=oxford.df,aes(x=occasion_id,y=height))
chart <- chart + geom_point(aes(colour=stu_code))
# see if lm and glm give the same estimate
glm.01 <- lm(height~age_central+occasion_id,data=oxford.df)
glm.02 <- glm(height~age_central+occasion_id,data=oxford.df,family="gaussian")
summary(glm.02)
vcov(glm.02)
var(glm.02$residual)
(logLik(glm.01)*-2)-(logLik(glm.02)*-2)
1-pchisq(-2.273737e-13,1)
# lm and glm give the same estimation
# so glm.02 will be used from now on
# see if lmer without level2 variable give same result as glm.02
mlm.03 <- lmer(height~age_central+occasion_id+(1|dummy),data=oxford.df,REML=FALSE)
(logLik(glm.02)*-2)-(logLik(mlm.03)*-2)
# 1-pchisq(-3.408097e-07,1)
# glm.02 and mlm.03 give the same estimation, only if REML=FALSE
mlm.03
gives me the following output: mlm.03
给我以下输出:
> mlm.03
Linear mixed model fit by maximum likelihood
Formula: height ~ age_central + occasion_id + (1 | dummy)
Data: oxford.df
AIC BIC logLik deviance REMLdev
1650 1667 -819.9 1640 1633
Random effects:
Groups Name Variance Std.Dev.
dummy (Intercept) 0.000 0.0000
Residual 64.712 8.0444
Number of obs: 234, groups: dummy, 1
Fixed effects:
Estimate Std. Error t value
(Intercept) 142.994 21.132 6.767
age_central 1.340 17.183 0.078
occasion_id 1.299 4.303 0.302
Correlation of Fixed Effects:
(Intr) ag_cnt
age_central 0.999
occasion_id -1.000 -0.999
You can see that there is a variance for the residual in the random effect
section, which I have read from Applied Multilevel Analysis - A Practical Guide
by Jos WR Twisk, that this represents the amount of "unexplained variance" from the model.您可以看到
random effect
部分中的残差存在方差,我从 Jos WR Twisk 的Applied Multilevel Analysis - A Practical Guide
中读到,这表示模型中“无法解释的方差”的数量。
I wondered if I could arrive at the same residual variance from glm.02
, so I tried the following:我想知道是否可以从
glm.02
得出相同的剩余方差,所以我尝试了以下方法:
> var(resid(glm.01))
[1] 64.98952
> sd(resid(glm.01))
[1] 8.061608
The results are slightly different from the mlm.03
output.结果与
mlm.03
输出略有不同。 Does this refer to the same "residual variance" stated in mlm.03
?这是否指的是
mlm.03
中所述的相同“剩余方差”?
Your glm.02
and glm.01
estimate a simple linear regression model using least squares.您的
glm.02
和glm.01
使用最小二乘估计简单的线性回归模型。 On the other hand, mlm.03
is a linear mixed model estimated through maximum likelihood.另一方面,
mlm.03
是通过最大似然估计的线性混合模型。 I don't know your dataset, but it looks like you use the dummy
variable to create a cluster structure at level-2 with zero variance.我不知道您的数据集,但看起来您使用
dummy
变量在 2 级创建了一个零方差的集群结构。
So your question has basically two answers, but only the second answer is important in your case.所以你的问题基本上有两个答案,但只有第二个答案对你的情况很重要。 The models
glm.02
and mlm.03
do not contain the same residual variance estimate, because...模型
glm.02
和mlm.03
不包含相同的残差方差估计,因为...
The models are usually different (mixed effects vs. classical regression).模型通常不同(混合效应与经典回归)。 In your case, however, the
dummy
variable seems to supress the additional variance component in the mixed model.但是,在您的情况下,
dummy
变量似乎抑制了混合模型中的附加方差分量。 So for me the models seem to be equal.所以对我来说,模型似乎是平等的。
The method used to estimate the residual variance is different.用于估计残差方差的方法不同。
glm
uses LS, lmer
uses ML in your code. glm
使用 LS, lmer
在您的代码中使用 ML。 ML estimates for the residual variance are slightly biased (resulting in smaller variance estimates).残差方差的 ML 估计略有偏差(导致方差估计更小)。 This can be solved by using REML instead of ML to estimate variance components.
这可以通过使用 REML 而不是 ML 来估计方差分量来解决。
Using classic ML (instead of REML), however, is still necessary and correct for the likelihood-ratio test.但是,对于似然比检验,使用经典 ML(而不是 REML)仍然是必要且正确的。 Using REML the comparison of the two likelihoods would not be correct.
使用 REML 比较两种可能性是不正确的。
Cheers!干杯!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.