简体   繁体   English

使用for循环从模型集合中绘制变量重要性

[英]Plotting variable importance from ensemble of models with for loop

I keep running into an error while attempting to plot variable importance from ensemble of models. 在尝试从模型集合中绘制变量重要性时,我一直遇到错误。

I have ensemble of models I've fitted and now I am trying to create multiple variable importance plots for each algorithm I've fitted. 我已经安装了多个模型,现在我正尝试为每个已安装的算法创建多个变量重要性图。 I am using varImp() function from caret to extract variable importance, then plot() it. 我从插入号使用varImp()函数提取变量重要性,然后对其进行plot() To fit ensemble of models, I am using caretEnsemble package. 为了适合模型集合,我使用了caretEnsemble包。

Thank you for any help, please see the example of code below. 感谢您的帮助,请参见下面的代码示例。

# Caret ensemble is needed to produce list of models
library(caret)
library(caretEnsemble)

# Set algorithms I wish to fit
my_algorithms <- c("glmnet", "svmRadial", "rf", "nnet", "knn", "rpart")

# Define controls
my_controls <- trainControl(
  method = "cv",
  savePredictions = "final",
  number = 3
)

# Run the models all at once with caretEnsemble
my_list_of_models <- caretEnsemble::caretList(Species ~ .,
                                 data = iris,
                                 trControl = my_controls,
                                 methodList = my_algorithms)
# Subset models
list_of_algorithms <- my_list_of_models[my_algorithms]

# Create first for loop to extract variable importance via caret::varImp()
importance <- list()
for (algo in seq_along(list_of_algorithms)) {
  importance[[algo]] <- varImp(list_of_algorithms[[algo]])
}

# Create second loop to go over extracted importance and plot it using plot()
importance_plots <- list()
for (imp in seq_along(importance)) {
  importance_plots[[imp]] <- plot(importance[[imp]])
}

# Error occurs during the second for loop:
Error in data.frame(values = unlist(unname(x)), ind, stringsAsFactors = FALSE):arguments imply differing number of rows: 16, 

I've come up with the solution to the problem above and decided to post it as my own answer. 我已经提出了上述问题的解决方案,并决定将其发布为我自己的答案。 I've written a small function to plot variable importance without relying on caret helper functions to create plots. 我编写了一个小函数来绘制变量的重要性,而不必依赖caret辅助函数来创建图。 I used dotplot and levelplot because caret returns data.frame that differs based on provided algorithm. 我使用dotplotlevelplot因为caret返回的data.frame随提供的算法而有所不同。 It may not work on different algorithms and models that didn't fit. 它可能不适用于不合适的不同算法和模型。

# Libraries ---------------------------------------------------------------
library(caret) # To train ML algorithms
library(dplyr) # Required for %>% operators in custom function below
library(caretEnsemble) # To train multiple caret models
library(lattice) # Required for plotting, should be loaded alongside caret
library(gridExtra) # Required for plotting multiple plots

# Custom function ---------------------------------------------------------
# The function requires list of models as input and is used in for loop 
plot_importance <- function(importance_list, imp, algo_names) {
  importance <- importance_list[[imp]]$importance
  model_title <- algo_names[[imp]]
  if (ncol(importance) < 2) { # Plot dotplot if dim is ncol < 2
    importance %>%
      as.matrix() %>%
      dotplot(main = model_title)
  } else { # Plot heatmap if ncol > 2
    importance %>%
      as.matrix() %>%
      levelplot(xlab = NULL, ylab = NULL, main = model_title, scales = list(x = list(rot = 45)))
  }
}

# Tuning parameters -------------------------------------------------------
# Set algorithms I wish to fit
# Rather than using methodList as provided above, I've switched to tuneList because I need to control tuning parameters of random forest algorithm.

my_algorithms <- list(
  glmnet = caretModelSpec(method = "glmnet"),
  rpart = caretModelSpec(method = "rpart"),
  svmRadial = caretModelSpec(method = "svmRadial"),
  rf = caretModelSpec(method = "rf", importance = TRUE), # Importance is not computed for "rf" by default
  nnet = caretModelSpec(method = "nnet"),
  knn = caretModelSpec(method = "knn")
)

# Define controls
my_controls <- trainControl(
  method = "cv",
  savePredictions = "final",
  number = 3
)

# Run the models all at once with caretEnsemble
my_list_of_models <- caretList(Species ~ .,
  data = iris,
  tuneList = my_algorithms,
  trControl = my_controls
)

# Extract variable importance ---------------------------------------------
importance <- lapply(my_list_of_models, varImp)

# Plotting variable immportance -------------------------------------------
# Create second loop to go over extracted importance and plot it using plot()
importance_plots <- list()
for (imp in seq_along(importance)) {
  # importance_plots[[imp]] <- plot(importance[[imp]])
  importance_plots[[imp]] <- plot_importance(importance_list = importance, imp = imp, algo_names = names(my_list_of_models))
}

# Multiple plots at once
do.call("grid.arrange", c(importance_plots))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM