[英]How does R lm choose contrasts with interaction between a categorical and continuous variables?
If I run lm
with a formula like Y ~ X1 + X2:X1 + X3:X1
where X1 is continuous and X2,X3 are categorical, I get a contrast for both levels of X2, but not X3.如果我使用像
Y ~ X1 + X2:X1 + X3:X1
这样的公式运行lm
,其中 X1 是连续的,而 X2,X3 是分类的,我会得到 X2 的两个级别的对比,但 X3 没有。
The pattern is that the first categorical interaction gets both levels but not the second.模式是第一个分类交互获得两个级别,但不是第二个。
library(tidyverse)
library(magrittr)
#>
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#>
#> set_names
#> The following object is masked from 'package:tidyr':
#>
#> extract
df = data.frame(Frivolousness = sample(1:100, 50, replace =T))
df %<>% mutate(
Personality=sample(c("Bad", "Good"), 50, replace = T),
Timing=ifelse(Frivolousness %% 2 == 0 & runif(50) > 0.2, "Early", "Late")
)
df %<>% mutate(
Enchantedness = 11 +
ifelse(Personality=="Good", 0.23, -0.052)*Frivolousness -
1.3*ifelse(Personality=="Good", 1, 0) +
10*rnorm(50)
)
df %<>% mutate(
Personality = factor(Personality, levels=c("Bad", "Good")),
Timing = factor(Timing, levels=c("Early", "Late"))
)
lm(Enchantedness ~ Personality + Timing + Timing:Frivolousness + Personality:Frivolousness, df)
#>
#> Call:
#> lm(formula = Enchantedness ~ Personality + Timing + Timing:Frivolousness +
#> Personality:Frivolousness, data = df)
#>
#> Coefficients:
#> (Intercept) PersonalityGood
#> 15.64118 -10.99518
#> TimingLate TimingEarly:Frivolousness
#> -1.41757 -0.05796
#> TimingLate:Frivolousness PersonalityGood:Frivolousness
#> -0.07433 0.33410
lm(Enchantedness ~ Personality + Timing + Personality:Frivolousness+ Timing:Frivolousness , df)
#>
#> Call:
#> lm(formula = Enchantedness ~ Personality + Timing + Personality:Frivolousness +
#> Timing:Frivolousness, data = df)
#>
#> Coefficients:
#> (Intercept) PersonalityGood
#> 15.64118 -10.99518
#> TimingLate PersonalityBad:Frivolousness
#> -1.41757 -0.05796
#> PersonalityGood:Frivolousness TimingLate:Frivolousness
#> 0.27614 -0.01636
Created on 2020-02-15 by the reprex package (v0.3.0)由reprex 包(v0.3.0) 于 2020 年 2 月 15 日创建
I think the reason it is dropped is that there would be perfect colinearity if it was included.我认为它被删除的原因是如果包含它会有完美的共线性。 You really should have Frivolousness as a regressor on its own also.
你真的应该把轻薄本身也作为一个回归器。 Then, you will see that R provides you with the result for just one level of both interactions.
然后,您将看到 R 为您提供两种交互的仅一个级别的结果。
You get this kind of weird behavior because you are missing the term main term, Frivolousness
.你会得到这种奇怪的行为,因为你错过了主要术语
Frivolousness
。 If you do:如果你这样做:
set.seed(111)
## run your data frame stuff
lm(Enchantedness ~ Personality + Timing + Timing:Frivolousness + Personality:Frivolousness, df)
Coefficients:
(Intercept) PersonalityGood
-1.74223 5.31189
TimingLate TimingEarly:Frivolousness
12.47243 0.19090
TimingLate:Frivolousness PersonalityGood:Frivolousness
-0.09496 0.17383
lm(Enchantedness ~ Personality + Timing + Frivolousness+Timing:Frivolousness + Personality:Frivolousness, df)
Coefficients:
(Intercept) PersonalityGood
-1.7422 5.3119
TimingLate Frivolousness
12.4724 0.1909
TimingLate:Frivolousness PersonalityGood:Frivolousness
-0.2859 0.1738
In your model, the interaction term TimingLate:Frivolousness means the change in slope of Frivolousness when Timing is Late.在您的模型中,交互项 TimingLate:Frivorousness 表示时间延迟时 Frivolousness 斜率的变化。 Since the default is not estimated, it has to do it for TimingEarly (the reference level).
由于未估计默认值,因此必须为 TimingEarly(参考级别)执行此操作。 Hence you can see the coefficients for TimingEarly:Frivolousness and Frivolousness are the same.
因此,您可以看到 TimingEarly:Frivorousness 和 Frivolousness 的系数是相同的。
As you can see the TimingLate:Frivolousness are very different and In your case I think doesn't make sense to do only the interaction term without the main effect, because it's hard to interpret or model it.正如您所看到的,TimingLate:Frivorousness 是非常不同的,在您的情况下,我认为只做没有主效应的交互项是没有意义的,因为很难对其进行解释或建模。
You can roughly check what is the slope for different groups of timing and the model with all terms gives a good estimate:您可以粗略地检查不同时间组的斜率是多少,所有项的模型给出了一个很好的估计:
df %>% group_by(Timing) %>% do(tidy(lm(Enchantedness ~ Frivolousness,data=.)))
# A tibble: 4 x 6
# Groups: Timing [2]
Timing term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 Early (Intercept) 6.13 6.29 0.975 0.341
2 Early Frivolousness 0.208 0.0932 2.23 0.0366
3 Late (Intercept) 11.5 5.35 2.14 0.0419
4 Late Frivolousness -0.00944 0.107 -0.0882 0.930
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.