如何在尊重可变对比度编码的同时使用 anova() 对 lm 和 lmer 对象进行显着性测试？

Question

I am confused about the relationship between the significance test result shown in the output of summary() called on a lm or lmer object, and the result shown in the output of anova() called on that same object.我对在 lm 或 lmer 对象上调用的 summary() 输出中显示的显着性测试结果与在同一对象上调用的 anova() 输出中显示的结果之间的关系感到困惑。 Specifically, I don't understand (a) why, for factors with df=1 (for which it should be possible to compare results), the results don't always agree;具体来说，我不明白 (a) 为什么对于 df=1 的因素（应该可以比较结果），结果并不总是一致； and (b) why summary() respects the contrast weights assigned to each factor but anova() does not. (b) 为什么 summary() 尊重分配给每个因素的对比权重，而 anova() 没有。

Here is an example for lm:这是 lm 的示例：

data(iris)

## Apply treatment coding to Species, and fit model
contrasts(iris$Species) <- contr.treatment(length(levels(iris$Species)))
iris.lm.treatment <- lm(Sepal.Length ~ Petal.Length * Species, data=iris)

# check Petal.Length p-value in lm() output
coef(summary(iris.lm.treatment))["Petal.Length","Pr(>|t|)"]
[1] 0.05199902 

# check Petal.Length p-value in anova() output
as.data.frame(anova(iris.lm.treatment))["Petal.Length","Pr(>F)"]
[1] 1.244558e-56


## Apply sum coding to Species, and fit model
contrasts(iris$Species) <- contr.sum(length(levels(iris$Species)))/2
iris.lm.sum <- lm(Sepal.Length ~ Petal.Length * Species, data=iris)

# check Petal.Length p-value in lm() output
coef(summary(iris.lm.sum))["Petal.Length","Pr(>|t|)"]
[1] 2.091453e-12 

# check Petal.Length p-value in anova() output
as.data.frame(anova(iris.lm.sum))["Petal.Length","Pr(>F)"]
[1] 1.244558e-56

The significance test of Petal.Length in the fitted lm changes when the contrast coding of Species changes – that makes sense because the model evaluates each factor with orthogonal factors held constant at zero.当 Species 的对比度编码发生变化时，拟合 lm 中 Petal.Length 的显着性检验会发生变化——这是有道理的，因为该模型评估每个因子时正交因子保持为零。 However, the significance test of Petal.Length in the anova result is the same either way, and does not match the result from either lm.但是，方差分析结果中 Petal.Length 的显着性检验是相同的，并且与任一 lm 的结果都不匹配。

The behavior with lmer (with significance testing accomplished via the lmerTest package) is confusing in a related way: lmer 的行为（通过 lmerTest 包完成显着性测试）在相关方面令人困惑：

library(lmerTest)
data(ham)

## Apply treatment coding to variables, and fit model
contrasts(ham$Information) <- contr.treatment(length(levels(ham$Information)))
contrasts(ham$Product    ) <- contr.treatment(length(levels(ham$Product    )))
ham.lmer.treatment <- lmer(Informed.liking ~ Information * Product + (1 | Consumer) + (1 | Consumer:Product), data=ham)

# check Information p-value in lmer() output
coef(summary(ham.lmer.treatment))["Information2","Pr(>|t|)"]
[1] 0.4295516

# check Information p-value in anova() output
as.data.frame(anova(ham.lmer.treatment))["Information","Pr(>F)"]
[1] 0.04885354


## Apply sum coding to variables, and fit model
contrasts(ham$Information) <- contr.sum(length(levels(ham$Information)))/2
contrasts(ham$Product    ) <- contr.sum(length(levels(ham$Product    )))/2
ham.lmer.sum <- lmer(Informed.liking ~ Information * Product + (1 | Consumer) + (1 | Consumer:Product), data=ham)

# check Information p-value in lmer() output
coef(summary(ham.lmer.sum))["Information1","Pr(>|t|)"]
[1] 0.04885354

# check Information p-value in anova() output
as.data.frame(anova(ham.lmer.sum))["Information","Pr(>F)"]
[1] 0.04885354

Here, it is still the case that variable coding appears to affect the results shown in the output of summary() but not the results shown in the output of anova().在这里，变量编码似乎仍然会影响 summary() 输出中显示的结果，但不会影响 anova() 输出中显示的结果。 However, both anova() results match the lmer() result obtained when sum coding is used.但是，两个 anova() 结果都与使用 sum 编码时获得的 lmer() 结果匹配。

It seems to me that in both cases, anova() is ignoring the variable codings used and using some other variable coding – which, in the lmer case, appears to be sum coding – to evaluate significance.在我看来，在这两种情况下，anova() 都忽略了所使用的变量编码并使用其他一些变量编码——在 lmer 的情况下，似乎是和编码——来评估重要性。 I would like to know how to perform a statistical test that uses the assigned variable codings.我想知道如何执行使用指定变量编码的统计测试。 For lmer, at least, I can accomplish this with contestMD();至少对于 lmer 来说，我可以使用竞赛MD(); eg,例如，

# test Information significance while respecting contrast weights
contestMD(ham.lmer.treatment, as.numeric(names(fixef(ham.lmer.treatment))=="Information2"))[,"Pr(>F)"]
[1] 0.4295516   # matches output from summary(ham.lmer.treatment)

However, I can't figure out how to do the equivalent test for lm (presumably using glht, but I can't figure out the right function call).但是，我无法弄清楚如何对 lm 进行等效测试（大概是使用 glht，但我无法弄清楚正确的函数调用）。 So, my questions are:所以，我的问题是：

Conceptually, why does anova() not respect the assigned variable codings?从概念上讲，为什么 anova() 不尊重分配的变量编码？ (Presumably this is all intended behavior, but the reason why is opaque to me.) （大概这是所有预期的行为，但原因对我来说是不透明的。）
Practically, what variable coding is being used by anova() when called on an lm object?实际上，当对 lm 对象调用时， anova() 使用什么变量编码？
How can I perform the kind of significance testing I want with an lm object?如何使用 lm 对象执行我想要的那种显着性测试？ (I used examples with df=1 above because they can be compared between model output and anova() output, but of course what I really want to do is test for effects that have df>1.) （我在上面使用了 df=1 的例子，因为它们可以在模型输出和 anova() 输出之间进行比较，但当然我真正想做的是测试 df>1 的效果。）

Answer 1

I still haven't answered my first two questions, but in answer to the third one, it seems I can get the results I want by creating subset models – each with a factor removed – and comparing each one to the full model using anova().我仍然没有回答我的前两个问题，但在回答第三个问题时，似乎我可以通过创建子集模型来得到我想要的结果——每个模型都删除了一个因素——并使用 anova( ）。 For the example given above (iris.lm.treatment), I could do the following.对于上面给出的示例（iris.lm.treatment），我可以执行以下操作。 (In my example, I've gone to the trouble of first re-fitting the model with explicitly numeric predictors, as I otherwise encounter difficulties when using anova() to compare models.) （在我的示例中，我遇到了首先使用明确的数字预测变量重新拟合模型的麻烦，否则我在使用 anova() 比较模型时会遇到困难。）

# create numeric columns with the same contrast codings as the nominal factor
Species.numeric <- as.data.frame(model.matrix(~ Species, data=iris))

# drop Intercept column
Species.numeric <- Species.numeric[,2:ncol(Species.numeric)]

# rename columns as Species.num1 & Species.num2 and append to iris
names(Species.numeric) <- paste0("Species.num", 1:ncol(Species.numeric))
iris <- cbind(iris, Species.numeric)

# re-fit lm with all numeric predictors
iris.lm.treatment.num <- lm(Sepal.Length ~ Petal.Length * (Species.num1 + Species.num2), data=iris)

# for each factor, create a subset model that has that factor removed
iris.lm.treatment.num.noPetalLength <- update(iris.lm.treatment.num, . ~ . - Petal.Length                              )
iris.lm.treatment.num.noSpecies     <- update(iris.lm.treatment.num, . ~ . - (Species.num1 + Species.num2)             )
iris.lm.treatment.num.noInteraction <- update(iris.lm.treatment.num, . ~ . - Petal.Length:(Species.num1 + Species.num2))

# use anova() to compare each subset model to the full model
anova(iris.lm.treatment.num.noPetalLength, iris.lm.treatment.num)   # p =  .052
anova(iris.lm.treatment.num.noSpecies,     iris.lm.treatment.num)   # p = 7.611e-06
anova(iris.lm.treatment.num.noInteraction, iris.lm.treatment.num)   # p =  .1895

The main effect of petal length yields a p-value of .052, which matches the result in iris.lm.treatment.花瓣长度的主要影响产生的 p 值为 0.052，这与 iris.lm.treatment 中的结果相匹配。

如何在尊重可变对比度编码的同时使用 anova() 对 lm 和 lmer 对象进行显着性测试？

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-10-25 02:11:47

如何在尊重可变对比度编码的同时使用 anova() 对 lm 和 lmer 对象进行显着性测试？

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-10-25 02:11:47

解决方案1
0 已采纳 2021-10-25 02:11:47