简体   繁体   中英

How to create many Linear Regression models via a For Loop in R?

My problem is for the mtcars data set in R, I need to create all possible additive linear regression models where I'm regressing on the mpg variable. The null model is easy, as there's

10 choose 0 ways to get the null model, and 10 choose 1 ways to create a SLR on mpg; 10 choose 2 ways to create a two variable regression on mpg; 10 choose 3 ways to create a SLR on mpg; etc.,

So in total, as this is equivalent to summing across the 10th row in Pascal's Triangle, the total models I need to consider comes out to be 1,024.

Now, the other tricky part is I need to somehow store each model in some separate object so that all the 2 variable models are grouped together, all the three variable models are grouped together, etc, on top of also storing all them together (though perhaps there's a more efficient way to do this). The reason for this is my task is to look at all of these models, take their AIC scores and their Mallow's Cp scores, store those in a data frame, and then sort those scores from lowest to highest and keep the top 10. On top of this, I need to also be able to store, see, and have access to/use the two best 1-variable models through the two best 10-variable models because I need to provide the r-squared values and adjusted r-squared values for these various models along with the error mean square value. I'm still pretty/relatively new to R/coding in general, but I provide my attempt below:

library(rje)   # provides the powerSet function
library(olsrr) # provides the ols_mallows_cp function to calculate the Mallow's Cp values

mtcars <- datasets::mtcars

x <- powerSet(colnames(mtcars[,-1]))

datalist <- list()
for(i in c(2:1024)){
  datalist[[i]] <- mtcars[,colnames(mtcars) %in% c("mpg",x[[i]]) ]
}

full_model <- lm(mpg ~ ., data = mtcars)
Cp_vec <- c()

for (i in c(2:1024)){
  model <- lm(mpg ~ ., data = datalist[[i]])
  Cp_vec[i] <- ols_mallows_cp(model, full_model)
}

names(Cp_vec) <- as.character(c(1:1024)) 
TenSmallestCp <- Cp_vec[cpvec %in% head(sort(Cp_vec),10)]
Small_List <- list()

for (i in 1:10){
  Small_List[[i]] <- x[[as.numeric(names(TenSmallestCp))[i]]]
}

Small_List[[1]]
Small_List[[2]]
Small_List[[3]]
Small_List[[4]]
Small_List[[5]]
Small_List[[6]]
Small_List[[7]]
Small_List[[8]]
Small_List[[9]]
Small_List[[10]]

The way I currently have it produces this as its output:

[1] "cyl" "wt" 
[1] "hp" "wt"
[1] "cyl" "hp"  "wt" 
[1] "cyl"  "wt"   "qsec"
[1] "hp" "wt" "am"
[1] "wt"   "qsec" "am"  
[1] "disp" "wt"   "qsec" "am"  
[1] "hp"   "wt"   "qsec" "am"  
[1] "cyl"  "wt"   "carb"
[1] "wt"   "qsec" "am"   "carb"

So this tells me what the 10 best models are with regards to the Mallow's Cp scores, but perhaps it's just because I've been staring at this problem for way too long, but I can't figure out how to actually save the linear model and have access to it, say, if I wanted to plot it or something. I know I could just easily recreate it with my output, but I'm also trying to become efficient with my coding and not always resort to hard coding things, you know? I also cannot figure out how to store the models based on the number of variables that are included in the model so I can access the top two models from each.

Before posting this, I checked out these links:

  1. How to Loop/Repeat a Linear Regression in R

  2. Regression with for-loop with changing variables

  3. R Loop for Variable Names to run linear regression model

I admit that because I'm new, the answer to my problem(s) might fully exist in some linear combination of these three answers, and I'm just having trouble seeing it and putting it together, but while I think the first link I shared does have a lot that's relevant to my problem, and the last one also is pretty related, I'm not sure how the second one is much help. That's why I'm posting this as a new question.

Thanks for taking the time to read this lengthy post and consider helping me with my problem here!

Your approach wasn't so bad. This is how I reproduced your work as you described it:

library(rje)   # provides the powerSet function
library(olsrr) # provides the ols_mallows_cp function to calculate the Mallow's Cp values

x <- powerSet(colnames(mtcars[,-1]))
full_model <- lm( mpg ~ ., data=mtcars )

your_models <- lapply( x, function(n) {
    d_i <- mtcars[,c( "mpg", n), drop=FALSE] # use drop=FALSE to make sure it stays a 2d structure
    return( lm( mpg ~ ., data = d_i ) )
})

Cp_vec <- sapply( your_models, function(m) {
    ols_mallows_cp( m, full_model )
})

TenSmallestIndeces <- head( order( Cp_vec ), n=10 )

TenSmallestCp <- head( sort( Cp_vec ), n=10 )

TenSmallestSets <- x[ TenSmallestIndeces ]

## inspect one of your models:
your_models[[ TenSmallestIndeces[1] ]]

It's always preferable to use some sort of apply when collecting from a loop. I frequently use foreach from the foreach package also when building data frames or other 2d structures from a loop.

I create the subset just like you did, and fit the model pretty much the same way, just do it in one go.

Then you just need to understand sort() and order() proberly to look back up in the set you started out with I think.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM