Matlab/R - linear regression with categorical & continuous predictors - why is the continuous predictor squared?

Question

I'm doing a linear regression using categorical predictors and a 0 to 1 numerical outcome. On this page I saw it suggested to square a numerical predictor when it is alongside a nominal on (see third section on Linear Regression with Categorical Predictor ). The example they give (for Matlab, but this generalizes to R as well) is the following formula where weight is continuous and year is nominal:

mdl = fitlm(tbl,'MPG ~ Year + Weight^2')

Is this a universal rule? When I do it, I do get much stronger coefficients but I want to make sure I'm not inflating them without warrant. Could someone explain the logic of using .^ for numericals alongside categoricals?

Answer 1

If you graph mpg vs. weight for each year separately and you see curvature then a polynomial in weight might help correct for the non-linearity.

library(lattice)

u <- "https://raw.githubusercontent.com/shifteight/R/master/ISLR/Auto.csv"
Cars <- read.csv(u)

o <- with(Cars, order(year, weight))
xyplot(mpg ~ weight | year, Cars[o, ], type = c("p", "smooth"))

Matlab/R - linear regression with categorical & continuous predictors - why is the continuous predictor squared?

Question

1 answers

solution1
4 ACCPTED 2017-12-10 00:22:32

Matlab/R - linear regression with categorical & continuous predictors - why is the continuous predictor squared?

Question

1 answers

solution1 4 ACCPTED 2017-12-10 00:22:32

solution1
4 ACCPTED 2017-12-10 00:22:32