简体   繁体   中英

Constrain number of predictor variables in stepwise regression in R

I would like to be able to do a forward stepwise linear regression, but constrain the number of predictor variables to a maximum (in my specific case, three). Here is some sample data.

set.seed(123)
myDep <- runif(100)

pred1 <- myDep + runif(100)
pred2 <- myDep + rnorm(100)
pred3 <- myDep + runif(100) + rnorm(100)
pred4 <- myDep + runif(100) + runif(100)
pred5 <- runif(100)

myDF <- data.frame(myDep, pred1, pred2, pred3, pred4, pred5)

If I were to simply run a linear regression using the following code below, I would get all five predictor variables, obviously.

myModel <- lm(myDep ~ ., data = myDF)

What I would like to do it use step() or other R command to run a forward-direction stepwise that picks only three predictor variables and then stops.

For what it is worth, I tried this:

step(lm(myDep ~ ., data = myDF), steps = 3, direction = "forward")

and the results were the following -- but not what I want because it uses all five predictor variables.

Start:  AIC=-378.09
myDep ~ pred1 + pred2 + pred3 + pred4 + pred5

Call:
lm(formula = myDep ~ pred1 + pred2 + pred3 + pred4 + pred5, data = myDF)

Coefficients:
(Intercept)        pred1        pred2        pred3        pred4        pred5  
   -0.16617      0.30043      0.07983      0.03670      0.17869      0.01606 

I'm sure there's a way to do this, but I cannot seem to figure out the proper formatting. Thanks in advance.

You could use the regsubsets package in R, where you can limit the variables and choose your method ("forward").

https://www.rdocumentation.org/packages/leaps/versions/2.1-1/topics/regsubsets

library(regsubsets)

b <- regsubsets(myDep ~ ., data=myDF, nbest=1, nvmax=[enter your max # of predictors])
summary(b)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM