在 tidy 建模框架中集成变量重要性图

Question

Could somebody show me how to generate permutation-based variable implots within the tidy modelling framework?有人可以告诉我如何在整洁的建模框架中生成基于排列的变量内插吗？ Currently, I have this:目前，我有这个：

library(tidymodels)

# variable importance
final_fit_train %>%
  pull_workflow_fit() %>%
  vip(geom = "point",
      aesthetics = list(color = cbPalette[4],
                        fill = cbPalette[4])) +
  THEME +
  ggtitle("Elastic Net")

which generates this:这会产生：

However, I would like to have something like this但是，我想要这样的东西

It's not clear to me how the rather new tidy modelling framework integrates with the current VIP package.我不清楚这个相当新的整洁建模框架如何与当前的 VIP package 集成。 Anybody that could help.任何可以提供帮助的人。 Thanks!谢谢！

https://koalaverse.github.io/vip/articles/vip.html (API of the VIP package). https://koalaverse.github.io/vip/articles/vip.html （VIP包的API）。

Answer 1

To compute variable importance using permutation, you need just a few more pieces to put together, compared to using model-dependent variable importance.要使用置换计算变量重要性，与使用模型相关变量重要性相比，您只需将几部分放在一起即可。

Let's look at an example for an SVM model, which does not have model-dependent variable importance score.让我们看一个 SVM model 的示例，它没有模型依赖变量重要性得分。

library(tidymodels)
#> ── Attaching packages ──────────────────────── tidymodels 0.1.1 ──
#> ✓ broom     0.7.0      ✓ recipes   0.1.13
#> ✓ dials     0.0.8      ✓ rsample   0.0.7 
#> ✓ dplyr     1.0.0      ✓ tibble    3.0.3 
#> ✓ ggplot2   3.3.2      ✓ tidyr     1.1.0 
#> ✓ infer     0.5.3      ✓ tune      0.1.1 
#> ✓ modeldata 0.0.2      ✓ workflows 0.1.2 
#> ✓ parsnip   0.1.2      ✓ yardstick 0.0.7 
#> ✓ purrr     0.3.4
#> ── Conflicts ─────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()

data("hpc_data")

svm_spec <- svm_poly(degree = 1, cost = 1/4) %>%
  set_engine("kernlab") %>%
  set_mode("regression")

svm_fit <- workflow() %>%
  add_model(svm_spec) %>%
  add_formula(compounds ~ .) %>%
  fit(hpc_data)

svm_fit
#> ══ Workflow [trained] ════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: svm_poly()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────
#> compounds ~ .
#> 
#> ── Model ─────────────────────────────────────────────────────────
#> Support Vector Machine object of class "ksvm" 
#> 
#> SV type: eps-svr  (regression) 
#>  parameter : epsilon = 0.1  cost C = 0.25 
#> 
#> Polynomial kernel function. 
#>  Hyperparameters : degree =  1  scale =  1  offset =  1 
#> 
#> Number of Support Vectors : 2827 
#> 
#> Objective Function Value : -284.7255 
#> Training error : 0.835421

Our model is now trained , so it's ready for computing variable importance.我们的 model 现在已经过训练，因此可以计算变量重要性。 Notice a couple of steps:注意几个步骤：

You pull() the fitted model object out of the workflow.您将已安装的 model object 从工作流程中pull() 。
You have to specify the target/outcome variable, compounds .您必须指定目标/结果变量compounds 。
In this case, we need to pass both the original training data (use training data here, not testing data) and the right underlying function for predicting (this might be tricky to figure out in some cases but for most packages will just be predict() ).在这种情况下，我们需要同时传递原始训练数据（在此处使用训练数据，而不是测试数据）和正确的底层 function 进行预测（在某些情况下这可能很难弄清楚，但对于大多数包来说只是predict() ）。

library(vip)
#> 
#> Attaching package: 'vip'
#> The following object is masked from 'package:utils':
#> 
#>     vi
svm_fit %>%
  pull_workflow_fit() %>%
  vip(method = "permute", 
      target = "compounds", metric = "rsquared",
      pred_wrapper = kernlab::predict, train = hpc_data)

^{Created on 2020-07-17 by the reprex package (v0.3.0)}^{由代表 package (v0.3.0) 于 2020 年 7 月 17 日创建}

You can increase nsim here to do this more than once.您可以在此处增加nsim以多次执行此操作。

在 tidy 建模框架中集成变量重要性图

问题描述

1 个解决方案

解决方案1
2 2020-07-17 22:28:22

在 tidy 建模框架中集成变量重要性图

问题描述

1 个解决方案

解决方案1 2 2020-07-17 22:28:22

解决方案1
2 2020-07-17 22:28:22