自动变量选择–回归线性模型

Question

In the MWE below, I have a data set with 70 potential predictors to explain my variable price1 . 在下面的MWE中，我有一个包含70个潜在预测变量的数据集来解释我的变量price1 。 I would like to do univariate analysis with all the variables but the package glmulti says that I have too many predictors . 我想对所有变量进行单变量分析，但是glmulti软件包说我的too many predictors 。 How a univariate analysis can have too many predictors? 单变量分析如何有太多的预测变量？

*I could do it by means of a loop / apply but I am looking for something more elaborated. *我可以通过loop / apply但是我正在寻找更详细的内容。 This similar question here doesn't solve the question either. 这种类似的问题在这里没有任何解决的问题。

test <- read.csv(url("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/Ecdat/Car.csv"))
library(glmulti)
glmulti.lm.out <- glmulti(data  = test, price1 ~ .,
                          level = 1,
                          method = "h",
                          maxK = 1,
                          confsetsize = 10,
                          fitfunction = "lm")

Error
Warning message:
In glmulti(y = "price1", data = test, level = 1, maxK = 1, method = "h",  :
  !Too many predictors.

Answer 1

This question is more geared for CrossValidated, but here's my two cents. 这个问题更适合CrossValidated，但这是我的两分钱。 Running an exhaustive search to find the best variables to include in a model is very computationally heavy and gets out of hand really quickly. 进行详尽的搜索以找到要包含在模型中的最佳变量，这在计算上非常繁琐，而且很快就会失控。 Consider what you're asking the computer to do: 考虑您要计算机执行的操作：

When you're running an exhaustive search, the computer is building a model for every possible combination of variables. 当您进行详尽搜索时，计算机会为每种可能的变量组合构建模型。 For a model of size one, that's not too bad because that's only 70 models. 对于尺寸为1的模型，这还不错，因为只有70个模型。 But even for a two variable model, the computer has to run n!/(r!(nr)!) = 70!/(2!(68)!) = 2415 different models. 但是即使对于两个变量模型，计算机也必须运行n！/（r！（nr）！）= 70！/（2！（68）！）= 2415个不同的模型。 Things spiral out of control from there. 事情从那里开始失控。

As a work-around, I'll point you to the leaps package, which has the regsubsets function. 作为变通，我会指出你的leaps包，里面有regsubsets功能。 Then, you can run either a Forward or a Backward subset selection model and find the most important variables in a step-wise manner. 然后，您可以运行“向前”或“向后”子集选择模型，并逐步查找最重要的变量。 After running each, you may be able to toss out the variables that are omitted from each and run your model with fewer predictors using glmulti , but no promises. 在运行完每个变量之后，您可以扔掉每个变量中省略的变量，并使用glmulti以更少的预测变量运行模型，但无需保证。

test.data <-
read.csv(url("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/ma
ster/csv/Ecdat/Car.csv"))[,2:71]
library(leaps)

big_subset_model <- regsubsets(x = price1 ~ ., data = test.data, nbest = 1, 
method = "forward", really.big = TRUE, nvmax = 70)
sum.model <- summary(big_subset_model)

Answer 2

A simple solution for univariate analysis using lapply. 使用lapply进行univariate分析的简单解决方案。

test <- read.csv(url("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/Ecdat/Car.csv")) 

reg <- function(indep_var,dep_var,data_source) {
          formula <- as.formula(paste(dep_var," ~ ", indep_var))
          res     <- lm(formula, data = data_source)
          summary(res)
}

lapply(colnames(test), FUN = reg, dep_var = "price1", data_source = test)

自动变量选择–回归线性模型

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-08-04 16:15:24

解决方案2
0 2017-08-07 10:14:38

自动变量选择–回归线性模型

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-08-04 16:15:24

解决方案2 0 2017-08-07 10:14:38

解决方案1
1 已采纳 2017-08-04 16:15:24

解决方案2
0 2017-08-07 10:14:38