简体   繁体   English

R循环:多个线性回归模型(一次不包含1个变量)

[英]R Loop: Multiple linear regression models (exclude 1 variable at a time)

How do you create a loop that automates running several linear regression models? 如何创建一个自动运行多个线性回归模型的循环? I have a full model with 12 independent variables. 我有一个包含12个独立变量的完整模型。 I want to create other models that exclude 1 independent variable at a time. 我想创建其他模型,一次不包含1个自变量。

Please see the example below: 请参见以下示例:

 #round 1 full model
  formula <- Bound_Count~Days_diff_Eff_Subm_2 +
  TR_BS_BROKER_ID_360_2 + TR_BS_BROKER_ID_360_M +
  RURALPOP_P_CWR_2 + RURALPOP_P_CWR_M +
  TR_B_BROKER_ID_360 + TR_SCW_BROKER_ID_360 +
  PIP_Flag + TR_BS_BROKER_INDIVIDUAL_720_2 + TR_BS_BROKER_INDIVIDUAL_720_M +
  Resolved_Conflict + Priority_2

    # split train and test
    dataL_TT  <- dataL[dataL$DataSplit_Ind=="Modeling",]
    dataL_V <- dataL[dataL$DataSplit_Ind=="Validation",]
    # bind to submit model
    modelTT <- glm(formula
                   ,family=binomial(link = "logit"), data=dataL_TT)
    modelTT$aic

    # round 2 exclude TR_BS_BROKER_ID_360_M
    formula2 <- Bound_Count~Days_diff_Eff_Subm_2 +
      TR_BS_BROKER_ID_360_2 +
      RURALPOP_P_CWR_2 + RURALPOP_P_CWR_M +
      TR_B_BROKER_ID_360 + TR_SCW_BROKER_ID_360 +
      PIP_Flag +
      TR_BS_BROKER_INDIVIDUAL_720_2 + TR_BS_BROKER_INDIVIDUAL_720_M +
      Resolved_Conflict +
      Priority_2
    modelTT2 <- glm(formula2 , family=binomial(link = "logit"), data=dataL_TT)
    modelTT2$aic

    # round 3 exclude Days_diff_Eff_Subm_2
    formula3 <- Bound_Count~TR_BS_BROKER_ID_360_2 + TR_BS_BROKER_ID_360_M +
      RURALPOP_P_CWR_2 + RURALPOP_P_CWR_M +
      TR_B_BROKER_ID_360 + TR_SCW_BROKER_ID_360 +
      PIP_Flag +
      TR_BS_BROKER_INDIVIDUAL_720_2 + TR_BS_BROKER_INDIVIDUAL_720_M +
      Resolved_Conflict +
      Priority_2
    modelTT3 <- glm(formula3 , family=binomial(link = "logit"), data=dataL_TT)
    modelTT3$aic

    # round 4 exclude TR_BS_BROKER_ID_360_2
    formula4 <- Bound_Count~Days_diff_Eff_Subm_2 + TR_BS_BROKER_ID_360_M +
      RURALPOP_P_CWR_2 + RURALPOP_P_CWR_M +
      TR_B_BROKER_ID_360 + TR_SCW_BROKER_ID_360 +
      PIP_Flag +
      TR_BS_BROKER_INDIVIDUAL_720_2 + TR_BS_BROKER_INDIVIDUAL_720_M +
      Resolved_Conflict +
      Priority_2
    modelTT4 <- glm(formula4 , family=binomial(link = "logit"), data=dataL_TT)
    modelTT4$aic

And so on.. Basically I need to have 12 models that exclude 1 distinct independent variable at a time. 依此类推。基本上,我需要有12个模型,一次要排除1个不同的自变量。

Here is an idea: 这是一个主意:

d <- data.frame(y = 1, x1 = 2, x2 = 3, x3 = 4)
allFeatures <- names(d)[-1] # exclude y
# container for models
listOfModels <- vector("list", length(allFeatures))
# loop over features
for (i in seq_along(allFeatures)) {
  # exclude feature i
  currentFeatures <- allFeatures[-i]
  # programmatically assemble regression formula
  regressionFormula <- as.formula(
     paste("y ~ ", paste(currentFeatures, collapse="+")))
  # fit model
  currentModel <- lm(formula = regressionFormula, data = d)
  # store model in container
  listOfModels[[i]] <- currentModel
} 

Then you just retrieve models from listOfModels with the standard list syntax, ie listOfModels[[1]] returns model without x1 , and so on. 然后,您可以使用标准列表语法从listOfModels检索模型,即listOfModels[[1]]返回不带x1模型,依此类推。

EDIT 编辑

I am not sure why you would want to sort the data for a histogram, but here: 我不确定为什么要对直方图的数据进行排序,但是在这里:

vectorOfAICs <- vapply(listOfModels, function(x) AIC(x), numeric(1))
sortedAICs <- vectorOfAICs[order(vectorOfAICs)]
hist(sortedAICs)

The answer in the comment is pretty much spot on, with two caveats: 评论中的答案很明显,有两个警告:

1) to get an AIC from a fitted LM model, the call is AIC(modelObject) . 1)从拟合的LM模型获取AIC,调用为AIC(modelObject)

2) lapply() will give you back a list, which you probably don't want if your goal is to plot the data. 2) lapply()会给您返回一个列表,如果您的目标是绘制数据,则可能不需要。 Better use sapply() or vapply() to get back a numeric vector, which can be sorted and plotted easier. 最好使用sapply()vapply()来返回数值向量,该向量可以更容易地排序和绘制。

fullmodel#NO VARIABLES REMOVED
vars=c(variables to be remove one at a time here)#Put all the variables in a vector
Map(function(x)update(fullmodel,paste0(".~.-",x),data=datahere),vars)#

The Map loops over Removing each and every variable at a time from the full model that was created. Map循环一次从创建的完整模型中删除每个变量。 using update(lm(mtcars),.~.-cyl,data=mtcars) for example will remove the cyl from the lm function ie update the lm object which had earlier been created. 例如,使用update(lm(mtcars),.~.-cyl,data=mtcars)将从lm函数中删除该cyl,即更新先前创建的lm object of course you can use add1 , drop1 and even drop.terms 当然,您可以使用add1drop1甚至drop.terms

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM