简体   繁体   English

关于使用lm在R中进行线性回归建模的I()项

[英]regarding the I( ) term in linear regression modeling in R using lm

I once saw a linear model fitting written as follows: 我曾经看到线性模型拟合写成如下:

lm(formula = Ozone ~ Solar.R + Wind + Temp + I(Wind^2) + I(Temp^2) + 
I(Wind * Temp) + I(Wind * Temp^2) + I(Temp * Wind^2) + I(Temp^2 * 
Wind^2), data = airquality)

I am not sure what does I( ) mean here? 我不确定I( )在这里是什么意思? Or for example, what does I(Wind * Temp^2) here. 或者例如, I(Wind * Temp^2)在这里是什么I(Wind * Temp^2) can I write it as Wind:Temp^2 ? 我可以将其写为Wind:Temp^2吗?

The I() notation in the formula syntax in R means 'as is' ie I(a+b) simply means add the variable a+b as a predictor in the lm model. R中的公式语法中的I()表示“按原样”,I(a+b)仅表示在lm模型中添加变量a + b作为预测变量。 In your case I(Wind * Temp^2) means include as a predictor variable the product of Wind and Temp squared. 在您的情况下, I(Wind * Temp^2)意味着将Wind和Temp平方的乘积包括在内作为预测变量。 The I() function is used so that there is no confusion with the operators of the formula syntax. 使用I()函数是为了避免与公式语法的运算符混淆。

For more info page 2 here explains it in full detail. 有关更多信息,请在此处第2页进行详细说明。

Hope this is clear! 希望这很清楚!

UPDATE I just want to add Hong Ooi's very good comment on this: 更新我只想在此添加Hong Ooi的非常好的评论:

I(Wind * Temp^2) is not the same as Wind:Temp^2 I(Wind * Temp^2) Wind:Temp ^ 2不同

The ^n operator in formula syntax means 'include these variables and all interactions up to n way' . 公式语法中的^n运算符表示“包括这些变量和所有交互,直至n方式” For example Y ~ (X + Z + W)^2 is equivalent to Y ~ X + Z + W + X:Z + X:W + Z:W 例如Y ~ (X + Z + W)^2等效于Y ~ X + Z + W + X:Z + X:W + Z:W

So, in our case Wind:Temp^2 means just Wind:Temp 因此,在我们的示例中, Wind:Temp^2意味着Wind:Temp

Small illustration: 小插图:

Y <- runif(100)
X1 <- runif(100)
X2 <- runif(100)
df <- data.frame(Y,X1,X2)

> b <- lm( Y ~ X1:X2^2,data=df)
> summary(b)

Call:
lm(formula = Y ~ X1:X2^2, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.4802 -0.2490 -0.0173  0.2345  0.5066 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.45126    0.04794   9.413 2.28e-15 ***
X1:X2        0.08991    0.13414   0.670    0.504    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2965 on 98 degrees of freedom
Multiple R-squared:  0.004563,  Adjusted R-squared:  -0.005594 
F-statistic: 0.4493 on 1 and 98 DF,  p-value: 0.5043

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM