简体   繁体   中英

Matlab/R - linear regression with categorical & continuous predictors - why is the continuous predictor squared?

I'm doing a linear regression using categorical predictors and a 0 to 1 numerical outcome. On this page I saw it suggested to square a numerical predictor when it is alongside a nominal on (see third section on Linear Regression with Categorical Predictor ). The example they give (for Matlab, but this generalizes to R as well) is the following formula where weight is continuous and year is nominal:

mdl = fitlm(tbl,'MPG ~ Year + Weight^2')

Is this a universal rule? When I do it, I do get much stronger coefficients but I want to make sure I'm not inflating them without warrant. Could someone explain the logic of using .^ for numericals alongside categoricals?

If you graph mpg vs. weight for each year separately and you see curvature then a polynomial in weight might help correct for the non-linearity.

library(lattice)

u <- "https://raw.githubusercontent.com/shifteight/R/master/ISLR/Auto.csv"
Cars <- read.csv(u)

o <- with(Cars, order(year, weight))
xyplot(mpg ~ weight | year, Cars[o, ], type = c("p", "smooth"))

截图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM