[英]Unequal variance in R linear mixed model post-hoc lsmeans issue
我将线性混合 model 拟合到农业数据中,通过将weights = varIdent(...)
传递给lme
来解释组(品种)之间的不相等差异。 lsmeans
显示的标准误差与显着差异不一致。
我的代码示例如下。 重现 output 的数据可在此处找到。
library(nlme)
library(multcomp)
library(emmeans)
model <- lme(variable ~ cultivar*year,
random = ~1|block,
weights = varIdent(form = ~1|cultivar),
method = "REML",
na.action = na.omit,
data = ag.data)
Leastsquare <- lsmeans(model,"cultivar")
cld(Leastsquare, Letters = letters)
cultivar lsmean SE df lower.CL upper.CL .group
Golden 6.92 3.841 1 -41.9 55.7 a
Campfield 10.33 4.330 1 -44.7 65.4 a
Tom 17.50 0.167 1 15.4 19.6 a
Harrison 25.67 12.649 1 -135.1 186.4 ab
Puget 30.58 20.502 1 -229.9 291.1 ab
HVC 37.08 5.331 1 -30.7 104.8 b
COL 38.08 0.433 1 32.6 43.6 b
Brown 62.67 20.207 1 -194.1 319.4 ab
品种Brown
与Golden
没有显着差异怎么可能? 这可以接受吗? 有没有人看到类似的结果?
我把你的例子变成了一个reprex 。 请在下面找到我的评论。
library(tidyverse)
ag.data <- tibble::tribble(
~year, ~cultivar, ~block, ~variable,
"nineteen", "HVC", 1L, 14.33333333,
"nineteen", "HVC", 2L, 23.33333333,
"nineteen", "Puget", 1L, 2.333333333,
"nineteen", "Puget", 2L, 3.333333333,
"nineteen", "Campfield", 1L, NA,
"nineteen", "Campfield", 2L, 4,
"nineteen", "Tom", 1L, 10,
"nineteen", "Tom", 2L, 10,
"nineteen", "Brown", 1L, NA,
"nineteen", "Brown", 2L, 56.66666667,
"nineteen", "COL", 1L, NA,
"nineteen", "COL", 2L, 51.66666667,
"nineteen", "Golden", 1L, 5,
"nineteen", "Golden", 2L, 1.666666667,
"nineteen", "Harrison", 1L, 52.33333333,
"nineteen", "Harrison", 2L, 4.333333333,
"twenty", "HVC", 1L, 45.66666667,
"twenty", "HVC", 2L, 65,
"twenty", "Puget", 1L, 17.33333333,
"twenty", "Puget", 2L, 99.33333333,
"twenty", "Campfield", 1L, 11.66666667,
"twenty", "Campfield", 2L, 21.66666667,
"twenty", "Tom", 1L, 25.33333333,
"twenty", "Tom", 2L, 24.66666667,
"twenty", "Brown", 1L, 45.33333333,
"twenty", "Brown", 2L, 92,
"twenty", "COL", 1L, 24,
"twenty", "COL", 2L, 25,
"twenty", "Golden", 1L, 3,
"twenty", "Golden", 2L, 18,
"twenty", "Harrison", 1L, 31,
"twenty", "Harrison", 2L, 15
)
library(nlme)
library(emmeans)
library(multcomp)
library(multcompView)
model <- lme(
variable ~ cultivar * year,
random = ~ 1 | block,
weights = varIdent(form = ~ 1 | cultivar),
method = "REML",
na.action = na.omit,
data = ag.data
)
anova(model)
#> numDF denDF F-value p-value
#> (Intercept) 1 12 16502.874 <.0001
#> cultivar 7 12 193.823 <.0001
#> year 1 12 952.713 <.0001
#> cultivar:year 7 12 296.145 <.0001
ag.data %>%
filter(!is.na(variable)) %>%
ggplot(aes(y = variable, x = year)) +
facet_wrap(vars(cultivar)) +
geom_point() +
stat_summary(fun = mean,
color = "red",
geom = "line",
aes(group = 1)) +
theme_bw()
emm <- emmeans(model, ~ cultivar) %>%
cld(Letters = letters) %>%
as_tibble() %>%
mutate(cultivar = fct_reorder(cultivar, emmean))
#> NOTE: Results may be misleading due to involvement in interactions
ggplot(emm, aes(
y = emmean,
ymin = lower.CL,
ymax = upper.CL,
x = cultivar,
label = str_trim(.group)
)) +
geom_point() +
geom_errorbar(width = 0.1) +
geom_text(
position = position_nudge(x = 0.1),
hjust = 0,
color = "red"
) +
theme_bw()
由代表 package (v2.0.1) 于 2022 年 1 月 24 日创建
解决您的问题为什么具有最低和最高 emmeans 的品种没有显着差异:我认为查看第二个 plot 可以清楚地表明,这是由于估计 emmeans 的精度差异很大。 这又部分是由于您在 model 中允许的异质误差差异,以及由于丢失/不平衡的数据。 我会争辩说,您“不习惯看到当中间值存在时,极值彼此之间没有统计学差异”,因为通常使用平衡数据和/或没有异质误差方差时,您不会。 尝试在没有weights =
参数的情况下运行代码(即使用标准同质误差方差) - 您将找不到这些结果所遇到的“问题”。
进一步注意,您实际上似乎有栽培品种-年份-见 anova anova()
和第一个 plot。 因此,查看栽培品种的平均值可能会产生误导,因为emmeans()
function 下方的注释说。 相反,您可以尝试通过emmeans(~ cultivar|year)
每年调查 emmean-comparisons。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.