[英]Matlab/R - linear regression with categorical & continuous predictors - why is the continuous predictor squared?
I'm doing a linear regression using categorical predictors and a 0 to 1 numerical outcome. 我正在使用分类预测变量和0到1的数字结果进行线性回归。 On this page I saw it suggested to square a numerical predictor when it is alongside a nominal on (see third section on
Linear Regression with Categorical Predictor
). 在此页面上,我看到它建议在数值预测变量与标称符号并排时对其求平方(请参阅关于
Linear Regression with Categorical Predictor
变量的Linear Regression with Categorical Predictor
第三部分)。 The example they give (for Matlab, but this generalizes to R as well) is the following formula where weight
is continuous and year
is nominal: 他们给出的示例(对于Matlab,但这也适用于R)是以下公式,其中
weight
是连续的, year
是标称的:
mdl = fitlm(tbl,'MPG ~ Year + Weight^2')
Is this a universal rule? 这是普遍规则吗? When I do it, I do get much stronger coefficients but I want to make sure I'm not inflating them without warrant.
当我这样做时,我确实得到了更强的系数,但是我想确保我不会在没有认股权证的情况下夸大它们。 Could someone explain the logic of using
.^
for numericals alongside categoricals? 有人可以解释使用
.^
和数字一起使用数字的逻辑吗?
If you graph mpg vs. weight for each year separately and you see curvature then a polynomial in weight might help correct for the non-linearity. 如果分别绘制每年的mpg与重量的关系图,并且看到曲率,则权重的多项式可能有助于校正非线性。
library(lattice)
u <- "https://raw.githubusercontent.com/shifteight/R/master/ISLR/Auto.csv"
Cars <- read.csv(u)
o <- with(Cars, order(year, weight))
xyplot(mpg ~ weight | year, Cars[o, ], type = c("p", "smooth"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.