How do you create a loop that automates running several linear regression models? I have a full model with 12 independent variables. I want to create other models that exclude 1 independent variable at a time.
Please see the example below:
#round 1 full model
formula <- Bound_Count~Days_diff_Eff_Subm_2 +
TR_BS_BROKER_ID_360_2 + TR_BS_BROKER_ID_360_M +
RURALPOP_P_CWR_2 + RURALPOP_P_CWR_M +
TR_B_BROKER_ID_360 + TR_SCW_BROKER_ID_360 +
PIP_Flag + TR_BS_BROKER_INDIVIDUAL_720_2 + TR_BS_BROKER_INDIVIDUAL_720_M +
Resolved_Conflict + Priority_2
# split train and test
dataL_TT <- dataL[dataL$DataSplit_Ind=="Modeling",]
dataL_V <- dataL[dataL$DataSplit_Ind=="Validation",]
# bind to submit model
modelTT <- glm(formula
,family=binomial(link = "logit"), data=dataL_TT)
modelTT$aic
# round 2 exclude TR_BS_BROKER_ID_360_M
formula2 <- Bound_Count~Days_diff_Eff_Subm_2 +
TR_BS_BROKER_ID_360_2 +
RURALPOP_P_CWR_2 + RURALPOP_P_CWR_M +
TR_B_BROKER_ID_360 + TR_SCW_BROKER_ID_360 +
PIP_Flag +
TR_BS_BROKER_INDIVIDUAL_720_2 + TR_BS_BROKER_INDIVIDUAL_720_M +
Resolved_Conflict +
Priority_2
modelTT2 <- glm(formula2 , family=binomial(link = "logit"), data=dataL_TT)
modelTT2$aic
# round 3 exclude Days_diff_Eff_Subm_2
formula3 <- Bound_Count~TR_BS_BROKER_ID_360_2 + TR_BS_BROKER_ID_360_M +
RURALPOP_P_CWR_2 + RURALPOP_P_CWR_M +
TR_B_BROKER_ID_360 + TR_SCW_BROKER_ID_360 +
PIP_Flag +
TR_BS_BROKER_INDIVIDUAL_720_2 + TR_BS_BROKER_INDIVIDUAL_720_M +
Resolved_Conflict +
Priority_2
modelTT3 <- glm(formula3 , family=binomial(link = "logit"), data=dataL_TT)
modelTT3$aic
# round 4 exclude TR_BS_BROKER_ID_360_2
formula4 <- Bound_Count~Days_diff_Eff_Subm_2 + TR_BS_BROKER_ID_360_M +
RURALPOP_P_CWR_2 + RURALPOP_P_CWR_M +
TR_B_BROKER_ID_360 + TR_SCW_BROKER_ID_360 +
PIP_Flag +
TR_BS_BROKER_INDIVIDUAL_720_2 + TR_BS_BROKER_INDIVIDUAL_720_M +
Resolved_Conflict +
Priority_2
modelTT4 <- glm(formula4 , family=binomial(link = "logit"), data=dataL_TT)
modelTT4$aic
And so on.. Basically I need to have 12 models that exclude 1 distinct independent variable at a time.
Here is an idea:
d <- data.frame(y = 1, x1 = 2, x2 = 3, x3 = 4)
allFeatures <- names(d)[-1] # exclude y
# container for models
listOfModels <- vector("list", length(allFeatures))
# loop over features
for (i in seq_along(allFeatures)) {
# exclude feature i
currentFeatures <- allFeatures[-i]
# programmatically assemble regression formula
regressionFormula <- as.formula(
paste("y ~ ", paste(currentFeatures, collapse="+")))
# fit model
currentModel <- lm(formula = regressionFormula, data = d)
# store model in container
listOfModels[[i]] <- currentModel
}
Then you just retrieve models from listOfModels
with the standard list syntax, ie listOfModels[[1]]
returns model without x1
, and so on.
EDIT
I am not sure why you would want to sort the data for a histogram, but here:
vectorOfAICs <- vapply(listOfModels, function(x) AIC(x), numeric(1))
sortedAICs <- vectorOfAICs[order(vectorOfAICs)]
hist(sortedAICs)
The answer in the comment is pretty much spot on, with two caveats:
1) to get an AIC from a fitted LM model, the call is AIC(modelObject)
.
2) lapply()
will give you back a list, which you probably don't want if your goal is to plot the data. Better use sapply()
or vapply()
to get back a numeric vector, which can be sorted and plotted easier.
fullmodel#NO VARIABLES REMOVED
vars=c(variables to be remove one at a time here)#Put all the variables in a vector
Map(function(x)update(fullmodel,paste0(".~.-",x),data=datahere),vars)#
The Map
loops over Removing each and every variable at a time from the full model that was created. using update(lm(mtcars),.~.-cyl,data=mtcars)
for example will remove the cyl from the lm function ie update the lm object
which had earlier been created. of course you can use add1
, drop1
and even drop.terms
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.