简体   繁体   中英

How to show the coefficient values and variable importance for logistic regression in R using caret package train() and varImp()

We're performing an exploratory logistic regression and trying to determine the importance of the variables in predicting the outcome. We are using the train() and varImp() functions from the caret package. Ultimately, we would like to create a table/dataframe output that has 3 columns: Variable Name, Importance, and Coefficient. An output like this:

Desired format of output.

1个

Here's some sample code to illustrate:

library(caret)

# Create a sample dataframe

my_DV <- c(0, 1, 0, 1, 1)
IV1 <- c(10, 40, 15, 35, 38)
IV2 <- c(1, 0, 1, 0, 1)
IV3 <- c(5, 4, 3, 2, 1)
IV4 <- c(5, 7, 3, 8, 9)
IV5 <- c(1, 2, 1, 2, 1)

df <- data.frame(my_DV, IV1, IV2, IV3, IV4, IV5)
df$my_DV <- as.factor(df$my_DV)
df$IV1 <- as.numeric(df$IV1)
df$IV2 <- as.factor(df$IV2)
df$IV3 <- as.numeric(df$IV3)
df$IV4 <- as.numeric(df$IV4)
df$IV5 <- as.factor(df$IV5)

# train model/perform logistic regression
model_one <- train(form = my_DV ~ ., data = df, trControl = trainControl(method = "cv", number = 5), 
    method = "glm", family = "binomial", na.action=na.omit)
summary(model_one)

# get the variable importance
imp <- varImp(model_one)
imp

I would like to take the importance values in imp and merge them with the coefficients from model_one but I'm fairly new to R and I can't figure out how to do it.

Any suggestions are greatly appreciated!

Here is one of many ways to get the desired output:

You assign the summary of the model to an object, and then extract the coefficients using coef() function, and then bind it with the variable names and the corresponding importance into a data frame. You then sort the rows based on the values of importance by using order() .

sum_mod <- summary(model_one)
dat <- data.frame(VariableName = rownames(imp$importance), 
    Importance = imp$importance, 
    Coefficient = coef(sum_mod)[rownames(imp$importance),][,1], 
    row.names = NULL) 
dat <- dat[order(dat$Overall, decreasing = TRUE),]

The result:

  VariableName   Overall Coefficient
1          IV1 100.00000   1.0999732
4          IV4  74.48458   3.6665775
2         IV21  34.43803  -7.8831404
3          IV3   0.00000  -0.9166444

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM