简体   繁体   English

如何通过 R 中的 For 循环创建多个线性回归模型?

[英]How to create many Linear Regression models via a For Loop in R?

My problem is for the mtcars data set in R, I need to create all possible additive linear regression models where I'm regressing on the mpg variable.我的问题是对于 R 中的 mtcars 数据集,我需要创建所有可能的加性线性回归模型,其中我要回归 mpg 变量。 The null model is easy, as there's null model 很简单,因为有

10 choose 0 ways to get the null model, and 10 choose 1 ways to create a SLR on mpg; 10选0方式获取null model,10选1方式在mpg上创建单反; 10 choose 2 ways to create a two variable regression on mpg; 10 选择 2 种方法在 mpg 上创建二变量回归; 10 choose 3 ways to create a SLR on mpg; 10 选择3种方式在mpg上创建单反; etc., ETC。,

So in total, as this is equivalent to summing across the 10th row in Pascal's Triangle, the total models I need to consider comes out to be 1,024.所以总的来说,因为这相当于帕斯卡三角中第 10 行的总和,所以我需要考虑的模型总数为 1,024。

Now, the other tricky part is I need to somehow store each model in some separate object so that all the 2 variable models are grouped together, all the three variable models are grouped together, etc, on top of also storing all them together (though perhaps there's a more efficient way to do this).现在,另一个棘手的部分是我需要以某种方式将每个 model 存储在一些单独的 object 中,以便将所有 2 个变量模型组合在一起,将所有三个变量模型组合在一起,等等,此外还将它们存储在一起(尽管也许有更有效的方法来做到这一点)。 The reason for this is my task is to look at all of these models, take their AIC scores and their Mallow's Cp scores, store those in a data frame, and then sort those scores from lowest to highest and keep the top 10. On top of this, I need to also be able to store, see, and have access to/use the two best 1-variable models through the two best 10-variable models because I need to provide the r-squared values and adjusted r-squared values for these various models along with the error mean square value.这样做的原因是我的任务是查看所有这些模型,获取它们的 AIC 分数和 Mallow 的 Cp 分数,将它们存储在数据框中,然后将这些分数从低到高排序并保持前 10 名。其中,我还需要能够通过两个最佳 10 变量模型存储、查看和访问/使用两个最佳 1 变量模型,因为我需要提供 r 平方值和调整后的 r 平方这些不同模型的值以及误差均方值。 I'm still pretty/relatively new to R/coding in general, but I provide my attempt below:一般来说,我对 R/编码还是比较陌生/相对较新,但我在下面提供了我的尝试:

library(rje)   # provides the powerSet function
library(olsrr) # provides the ols_mallows_cp function to calculate the Mallow's Cp values

mtcars <- datasets::mtcars

x <- powerSet(colnames(mtcars[,-1]))

datalist <- list()
for(i in c(2:1024)){
  datalist[[i]] <- mtcars[,colnames(mtcars) %in% c("mpg",x[[i]]) ]
}

full_model <- lm(mpg ~ ., data = mtcars)
Cp_vec <- c()

for (i in c(2:1024)){
  model <- lm(mpg ~ ., data = datalist[[i]])
  Cp_vec[i] <- ols_mallows_cp(model, full_model)
}

names(Cp_vec) <- as.character(c(1:1024)) 
TenSmallestCp <- Cp_vec[cpvec %in% head(sort(Cp_vec),10)]
Small_List <- list()

for (i in 1:10){
  Small_List[[i]] <- x[[as.numeric(names(TenSmallestCp))[i]]]
}

Small_List[[1]]
Small_List[[2]]
Small_List[[3]]
Small_List[[4]]
Small_List[[5]]
Small_List[[6]]
Small_List[[7]]
Small_List[[8]]
Small_List[[9]]
Small_List[[10]]

The way I currently have it produces this as its output:我目前拥有它的方式将其生成为 output:

[1] "cyl" "wt" 
[1] "hp" "wt"
[1] "cyl" "hp"  "wt" 
[1] "cyl"  "wt"   "qsec"
[1] "hp" "wt" "am"
[1] "wt"   "qsec" "am"  
[1] "disp" "wt"   "qsec" "am"  
[1] "hp"   "wt"   "qsec" "am"  
[1] "cyl"  "wt"   "carb"
[1] "wt"   "qsec" "am"   "carb"

So this tells me what the 10 best models are with regards to the Mallow's Cp scores, but perhaps it's just because I've been staring at this problem for way too long, but I can't figure out how to actually save the linear model and have access to it, say, if I wanted to plot it or something.所以这告诉我关于 Mallow 的 Cp 分数的 10 个最佳模型是什么,但也许只是因为我一直盯着这个问题太久了,但我不知道如何实际保存线性 model并且可以访问它,例如,如果我想 plot 它或其他东西。 I know I could just easily recreate it with my output, but I'm also trying to become efficient with my coding and not always resort to hard coding things, you know?我知道我可以用我的 output 轻松地重新创建它,但我也在努力提高我的编码效率,而不是总是求助于硬编码,你知道吗? I also cannot figure out how to store the models based on the number of variables that are included in the model so I can access the top two models from each.我也不知道如何根据 model 中包含的变量数量来存储模型,因此我可以从每个模型中访问前两个模型。

Before posting this, I checked out these links:在发布之前,我检查了这些链接:

  1. How to Loop/Repeat a Linear Regression in R 如何在 R 中循环/重复线性回归

  2. Regression with for-loop with changing variables 带有变化变量的 for 循环回归

  3. R Loop for Variable Names to run linear regression model R 循环变量名称以运行线性回归 model

I admit that because I'm new, the answer to my problem(s) might fully exist in some linear combination of these three answers, and I'm just having trouble seeing it and putting it together, but while I think the first link I shared does have a lot that's relevant to my problem, and the last one also is pretty related, I'm not sure how the second one is much help.我承认,因为我是新手,所以我的问题的答案可能完全存在于这三个答案的某种线性组合中,我只是很难看到它并将它放在一起,但是虽然我认为第一个链接我分享的确实有很多与我的问题相关的内容,最后一个也很相关,我不确定第二个有多大帮助。 That's why I'm posting this as a new question.这就是为什么我将其作为一个新问题发布。

Thanks for taking the time to read this lengthy post and consider helping me with my problem here!感谢您花时间阅读这篇冗长的帖子,并考虑在这里帮助我解决我的问题!

Your approach wasn't so bad.你的方法还不错。 This is how I reproduced your work as you described it:这就是我按照您的描述复制您的作品的方式:

library(rje)   # provides the powerSet function
library(olsrr) # provides the ols_mallows_cp function to calculate the Mallow's Cp values

x <- powerSet(colnames(mtcars[,-1]))
full_model <- lm( mpg ~ ., data=mtcars )

your_models <- lapply( x, function(n) {
    d_i <- mtcars[,c( "mpg", n), drop=FALSE] # use drop=FALSE to make sure it stays a 2d structure
    return( lm( mpg ~ ., data = d_i ) )
})

Cp_vec <- sapply( your_models, function(m) {
    ols_mallows_cp( m, full_model )
})

TenSmallestIndeces <- head( order( Cp_vec ), n=10 )

TenSmallestCp <- head( sort( Cp_vec ), n=10 )

TenSmallestSets <- x[ TenSmallestIndeces ]

## inspect one of your models:
your_models[[ TenSmallestIndeces[1] ]]

It's always preferable to use some sort of apply when collecting from a loop.从循环中收集时,最好使用某种类型的应用。 I frequently use foreach from the foreach package also when building data frames or other 2d structures from a loop.在从循环构建数据帧或其他二维结构时,我也经常使用 foreach package 中的 foreach。

I create the subset just like you did, and fit the model pretty much the same way, just do it in one go.我像您一样创建子集,并以几乎相同的方式安装 model,只需在一个 go 中完成即可。

Then you just need to understand sort() and order() proberly to look back up in the set you started out with I think.然后你只需要了解 sort() 和 order() 就可以在我认为开始的集合中回顾一下。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM