简体   繁体   English

多元线性回归中的交互项

[英]interaction terms in multiple linear regression

I have used the lm for my multiple regression analysis.我已将lm用于我的多元回归分析。 and then used GVLMA for Assumption test, where the results showed that Global Stat and Heteroskedasticity tests were not satisfied.然后使用GVLMA进行 Assumption 检验,结果表明 Global Stat 和 Heteroskedasticity 检验不满足。

the form of the code is as follows: (all variables are continuous)代码形式如下:(所有变量都是连续的)

model_1 <- lm (y ~ x1 + x2, data = abc)

Then I have run one more model with the same variables (thinking that I must introduce interaction terms fix the GVLMA assumptions)然后我又用相同的变量运行了一个 model(认为我必须引入交互项来修复 GVLMA 假设)

model_2 <- lm (y ~ x1 + x2, x1 * x2, data = abc)

With this model_2 , all the assumptions are satisfied.使用这个model_2 ,所有假设都得到满足。 But when I checked I have realised the way interaction terms introduced was not accurate.但是当我检查时,我意识到引入交互术语的方式并不准确。 I can't see what that 'comma' does here between the variables?我看不到变量之间的“逗号”在这里做什么?

I am in a difficult situation as the model is fitting well, but I cannot explain what , x1 * x2 does in the equation / results?我处于困境,因为 model 非常适合,但我无法解释, x1 * x2在方程式/结果中的作用?

Please help me to understand.请帮我理解。

With linear models the interaction term is defined by : and terms are separated by a + , so a model with the single and interaction terms is对于线性模型,交互项由:定义,项由+分隔,因此具有单项和交互项的 model 是

lm(y ~ x1:x2 + x1 + x2)

However, you can write x1*x2 which includes by the interaction and single effects so the following is equivalent to the above但是,您可以编写x1*x2 ,其中包括交互和单个效果,因此以下等效于上面

lm(y ~ x1*x2)

See what happens when using the built in dataset iris, where the fixed effects are specified as Petal.Width*Sepal.Length , all three terms are in the model summary:看看使用内置数据集 iris 时会发生什么,其中固定效果指定为Petal.Width*Sepal.Length ,所有三个术语都在 model 总结中:

Call:
lm(formula = Petal.Length ~ Petal.Width * Sepal.Length, data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.99588 -0.24329  0.00355  0.29735  1.24780 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)              -3.24804    0.59586  -5.451 2.08e-07 ***
Petal.Width               2.97115    0.35836   8.291 6.74e-14 ***
Sepal.Length              0.87551    0.11667   7.504 5.60e-12 ***
Petal.Width:Sepal.Length -0.22248    0.06384  -3.485  0.00065 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3888 on 146 degrees of freedom
Multiple R-squared:  0.9525,    Adjusted R-squared:  0.9515 
F-statistic: 975.4 on 3 and 146 DF,  p-value: < 2.2e-16

As to what the comma is doing in your models, it is creating a subset .至于逗号在模型中的作用,它正在创建一个子集 Compare the summary of the following three models: the first have 146 and 147 degrees of freedom - they have have 150 data points and estimate 4 and 3 parameters each.比较以下三个模型的总结:第一个模型有 146 和 147 个自由度——它们有 150 个数据点,分别估计 4 个和 3 个参数。 The third model, one that mimics your specification, has 129 degrees of freedom - that's what made me realise it was subsetting.第三个 model 模仿您的规格,具有 129 个自由度 - 这就是让我意识到它是子集的原因。 Checking the documentation for lm() , there is an argument for subsetting: lm(formula, data, subset, ...) .检查lm()的文档,有一个子集参数: lm(formula, data, subset, ...) Because data is specified explicitly, the unspecified arguments default to formula and subset .因为data是明确指定的,所以未指定的 arguments 默认为formulasubset You can also see that in the model summary, which shows a subset in the model call.您还可以在 model 摘要中看到这一点,该摘要显示了 model 调用中的一个子集。

summary(lm(Petal.Length ~ Petal.Width * Sepal.Length, data = iris))
summary(lm(Petal.Length ~ Petal.Width + Sepal.Length, data = iris))
summary(lm(Petal.Length ~ Petal.Width + Sepal.Length, Petal.Width * Sepal.Length, data = iris))

Your result can be recreated by passing this vector, iris$Petal.Width * iris$Sepal.Length , as row numbers - so be careful, that's resuing some rows a lot and skipping a lot too so the result of this model doesn't match one that use all the data (and each data point only once).你的结果可以通过传递这个向量来重新创建, iris$Petal.Width * iris$Sepal.Length作为行号- 所以要小心,这会重复很多行并且也会跳过很多,所以这个 model 的结果不会匹配一个使用所有数据(并且每个数据点仅一次)的。

summary(lm(Petal.Length ~ Petal.Width + Sepal.Length, data = iris[iris$Petal.Width * iris$Sepal.Length, ]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM