简体   繁体   English

从 R 中的 glm 和 lmer 提取的残差方差

[英]Residual variance extracted from glm and lmer in R

I am trying to take what I have read about multilevel modelling and merge it with what I know about glm in R. I am now using the height growth data from here .我正在尝试将我所阅读的有关多级建模的内容与我对 R 中glm的了解合并。我现在正在使用此处的高度增长数据。

I have done some coding shown below:我做了一些如下所示的编码:

library(lme4)
library(ggplot2)

setwd("~/Documents/r_code/multilevel_modelling/")

rm(list=ls())

oxford.df <- read.fwf("oxboys/OXBOYS.DAT",widths=c(2,7,6,1))
names(oxford.df) <- c("stu_code","age_central","height","occasion_id")
oxford.df <- oxford.df[!is.na(oxford.df[,"age_central"]),]
oxford.df[,"stu_code"] <- factor(as.character(oxford.df[,"stu_code"]))
oxford.df[,"dummy"] <- 1

chart <- ggplot(data=oxford.df,aes(x=occasion_id,y=height))
chart <- chart + geom_point(aes(colour=stu_code))

# see if lm and glm give the same estimate
glm.01 <- lm(height~age_central+occasion_id,data=oxford.df)
glm.02 <- glm(height~age_central+occasion_id,data=oxford.df,family="gaussian")
summary(glm.02)
vcov(glm.02)
var(glm.02$residual)
(logLik(glm.01)*-2)-(logLik(glm.02)*-2)
1-pchisq(-2.273737e-13,1)
# lm and glm give the same estimation
# so glm.02 will be used from now on

# see if lmer without level2 variable give same result as glm.02
mlm.03 <- lmer(height~age_central+occasion_id+(1|dummy),data=oxford.df,REML=FALSE)
(logLik(glm.02)*-2)-(logLik(mlm.03)*-2)
# 1-pchisq(-3.408097e-07,1)
# glm.02 and mlm.03 give the same estimation, only if REML=FALSE

mlm.03 gives me the following output: mlm.03给我以下输出:

> mlm.03
Linear mixed model fit by maximum likelihood 
Formula: height ~ age_central + occasion_id + (1 | dummy) 
   Data: oxford.df 
  AIC  BIC logLik deviance REMLdev
 1650 1667 -819.9     1640    1633
Random effects:
 Groups   Name        Variance Std.Dev.
 dummy    (Intercept)  0.000   0.0000  
 Residual             64.712   8.0444  
Number of obs: 234, groups: dummy, 1

Fixed effects:
            Estimate Std. Error t value
(Intercept)  142.994     21.132   6.767
age_central    1.340     17.183   0.078
occasion_id    1.299      4.303   0.302

Correlation of Fixed Effects:
            (Intr) ag_cnt
age_central  0.999       
occasion_id -1.000 -0.999

You can see that there is a variance for the residual in the random effect section, which I have read from Applied Multilevel Analysis - A Practical Guide by Jos WR Twisk, that this represents the amount of "unexplained variance" from the model.您可以看到random effect部分中的残差存在方差,我从 Jos WR Twisk 的Applied Multilevel Analysis - A Practical Guide中读到,这表示模型中“无法解释的方差”的数量。

I wondered if I could arrive at the same residual variance from glm.02 , so I tried the following:我想知道是否可以从glm.02得出相同的剩余方差,所以我尝试了以下方法:

> var(resid(glm.01))
[1] 64.98952
> sd(resid(glm.01))
[1] 8.061608

The results are slightly different from the mlm.03 output.结果与mlm.03输出略有不同。 Does this refer to the same "residual variance" stated in mlm.03 ?这是否指的是mlm.03中所述的相同“剩余方差”?

Your glm.02 and glm.01 estimate a simple linear regression model using least squares.您的glm.02glm.01使用最小二乘估计简单的线性回归模型。 On the other hand, mlm.03 is a linear mixed model estimated through maximum likelihood.另一方面, mlm.03是通过最大似然估计的线性混合模型。 I don't know your dataset, but it looks like you use the dummy variable to create a cluster structure at level-2 with zero variance.我不知道您的数据集,但看起来您使用dummy变量在 2 级创建了一个零方差的集群结构。

So your question has basically two answers, but only the second answer is important in your case.所以你的问题基本上有两个答案,但只有第二个答案对你的情况很重要。 The models glm.02 and mlm.03 do not contain the same residual variance estimate, because...模型glm.02mlm.03包含相同的残差方差估计,因为...

  1. The models are usually different (mixed effects vs. classical regression).模型通常不同(混合效应与经典回归)。 In your case, however, the dummy variable seems to supress the additional variance component in the mixed model.但是,在您的情况下, dummy变量似乎抑制了混合模型中的附加方差分量。 So for me the models seem to be equal.所以对我来说,模型似乎是平等的。

  2. The method used to estimate the residual variance is different.用于估计残差方差的方法不同。 glm uses LS, lmer uses ML in your code. glm使用 LS, lmer在您的代码中使用 ML。 ML estimates for the residual variance are slightly biased (resulting in smaller variance estimates).残差方差的 ML 估计略有偏差(导致方差估计更小)。 This can be solved by using REML instead of ML to estimate variance components.这可以通过使用 REML 而不是 ML 来估计方差分量来解决。

Using classic ML (instead of REML), however, is still necessary and correct for the likelihood-ratio test.但是,对于似然比检验,使用经典 ML(而不是 REML)仍然是必要且正确的。 Using REML the comparison of the two likelihoods would not be correct.使用 REML 比较两种可能性是不正确的。

Cheers!干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM