R lm 如何选择与分类变量和连续变量之间的交互作用的对比？

Question

If I run lm with a formula like Y ~ X1 + X2:X1 + X3:X1 where X1 is continuous and X2,X3 are categorical, I get a contrast for both levels of X2, but not X3.如果我使用像Y ~ X1 + X2:X1 + X3:X1这样的公式运行lm ，其中 X1 是连续的，而 X2,X3 是分类的，我会得到 X2 的两个级别的对比，但 X3 没有。

The pattern is that the first categorical interaction gets both levels but not the second.模式是第一个分类交互获得两个级别，但不是第二个。

library(tidyverse)
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

df = data.frame(Frivolousness = sample(1:100, 50, replace =T))
df %<>% mutate(
  Personality=sample(c("Bad", "Good"), 50, replace = T),
  Timing=ifelse(Frivolousness %% 2 == 0 & runif(50) > 0.2, "Early", "Late")
  )
df %<>% mutate(
  Enchantedness = 11 + 
    ifelse(Personality=="Good", 0.23, -0.052)*Frivolousness -
    1.3*ifelse(Personality=="Good", 1, 0) +
    10*rnorm(50)
  )
df %<>% mutate(
  Personality = factor(Personality, levels=c("Bad", "Good")),
  Timing = factor(Timing, levels=c("Early", "Late"))
)

lm(Enchantedness ~ Personality + Timing + Timing:Frivolousness + Personality:Frivolousness, df)
#> 
#> Call:
#> lm(formula = Enchantedness ~ Personality + Timing + Timing:Frivolousness + 
#>     Personality:Frivolousness, data = df)
#> 
#> Coefficients:
#>                   (Intercept)                PersonalityGood  
#>                      15.64118                      -10.99518  
#>                    TimingLate      TimingEarly:Frivolousness  
#>                      -1.41757                       -0.05796  
#>      TimingLate:Frivolousness  PersonalityGood:Frivolousness  
#>                      -0.07433                        0.33410

lm(Enchantedness ~ Personality + Timing + Personality:Frivolousness+ Timing:Frivolousness , df)
#> 
#> Call:
#> lm(formula = Enchantedness ~ Personality + Timing + Personality:Frivolousness + 
#>     Timing:Frivolousness, data = df)
#> 
#> Coefficients:
#>                   (Intercept)                PersonalityGood  
#>                      15.64118                      -10.99518  
#>                    TimingLate   PersonalityBad:Frivolousness  
#>                      -1.41757                       -0.05796  
#> PersonalityGood:Frivolousness       TimingLate:Frivolousness  
#>                       0.27614                       -0.01636

^{Created on 2020-02-15 by the reprex package (v0.3.0)}^{由reprex 包(v0.3.0) 于 2020 年 2 月 15 日创建}

Answer 1

I think the reason it is dropped is that there would be perfect colinearity if it was included.我认为它被删除的原因是如果包含它会有完美的共线性。 You really should have Frivolousness as a regressor on its own also.你真的应该把轻薄本身也作为一个回归器。 Then, you will see that R provides you with the result for just one level of both interactions.然后，您将看到 R 为您提供两种交互的仅一个级别的结果。

Answer 2

You get this kind of weird behavior because you are missing the term main term, Frivolousness .你会得到这种奇怪的行为，因为你错过了主要术语Frivolousness 。 If you do:如果你这样做：

set.seed(111)
## run your data frame stuff
lm(Enchantedness ~ Personality + Timing + Timing:Frivolousness + Personality:Frivolousness, df)

Coefficients:
                  (Intercept)                PersonalityGood  
                     -1.74223                        5.31189  
                   TimingLate      TimingEarly:Frivolousness  
                     12.47243                        0.19090  
     TimingLate:Frivolousness  PersonalityGood:Frivolousness  
                     -0.09496                        0.17383  

    lm(Enchantedness ~ Personality + Timing + Frivolousness+Timing:Frivolousness + Personality:Frivolousness, df)

Coefficients:
                  (Intercept)                PersonalityGood  
                      -1.7422                         5.3119  
                   TimingLate                  Frivolousness  
                      12.4724                         0.1909  
     TimingLate:Frivolousness  PersonalityGood:Frivolousness  
                      -0.2859                         0.1738

In your model, the interaction term TimingLate:Frivolousness means the change in slope of Frivolousness when Timing is Late.在您的模型中，交互项 TimingLate:Frivorousness 表示时间延迟时 Frivolousness 斜率的变化。 Since the default is not estimated, it has to do it for TimingEarly (the reference level).由于未估计默认值，因此必须为 TimingEarly（参考级别）执行此操作。 Hence you can see the coefficients for TimingEarly:Frivolousness and Frivolousness are the same.因此，您可以看到 TimingEarly:Frivorousness 和 Frivolousness 的系数是相同的。

As you can see the TimingLate:Frivolousness are very different and In your case I think doesn't make sense to do only the interaction term without the main effect, because it's hard to interpret or model it.正如您所看到的，TimingLate:Frivorousness 是非常不同的，在您的情况下，我认为只做没有主效应的交互项是没有意义的，因为很难对其进行解释或建模。

You can roughly check what is the slope for different groups of timing and the model with all terms gives a good estimate:您可以粗略地检查不同时间组的斜率是多少，所有项的模型给出了一个很好的估计：

df %>% group_by(Timing) %>% do(tidy(lm(Enchantedness ~ Frivolousness,data=.)))
# A tibble: 4 x 6
# Groups:   Timing [2]
  Timing term          estimate std.error statistic p.value
  <fct>  <chr>            <dbl>     <dbl>     <dbl>   <dbl>
1 Early  (Intercept)    6.13       6.29      0.975   0.341 
2 Early  Frivolousness  0.208      0.0932    2.23    0.0366
3 Late   (Intercept)   11.5        5.35      2.14    0.0419
4 Late   Frivolousness -0.00944    0.107    -0.0882  0.930

R lm 如何选择与分类变量和连续变量之间的交互作用的对比？

问题描述

2 个解决方案

解决方案1
0 2020-02-16 00:12:07

解决方案2
0 2020-02-16 00:58:13

R lm 如何选择与分类变量和连续变量之间的交互作用的对比？

问题描述

2 个解决方案

解决方案1 0 2020-02-16 00:12:07

解决方案2 0 2020-02-16 00:58:13

解决方案1
0 2020-02-16 00:12:07

解决方案2
0 2020-02-16 00:58:13