[英]Permutation based variable importance (violin) plots for random forest in Tidy models
I have built a random forest tidy model very similar to what Julia Silge has done in this video .我已经构建了一个随机森林整洁的 model 非常类似于 Julia Silge 在这个视频中所做的。 I also plan to show variable importance plots based on the permutation method, however I would like to show box plots or violin plots, rather than points.
我还计划显示基于排列方法的可变重要性图,但是我想显示箱线图或小提琴图,而不是点。
Here is an example, following Julia's code :这是一个示例,遵循 Julia 的代码:
Data and Model Building数据和Model大楼
# DATA
library(tidyverse)
water_raw <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-04/water.csv")
# Data prep
water <- water_raw %>%
filter(
country_name == "Sierra Leone",
lat_deg > 0, lat_deg < 15, lon_deg < 0,
status_id %in% c("y", "n")
) %>%
mutate(pay = case_when(
str_detect(pay, "^No") ~ "no",
str_detect(pay, "^Yes") ~ "yes",
is.na(pay) ~ pay,
TRUE ~ "it's complicated"
)) %>%
select(-country_name, -status, -report_date) %>%
mutate_if(is.character, as.factor)
library(tidymodels)
set.seed(123)
water_split <- initial_split(water, strata = status_id)
water_train <- training(water_split)
water_test <- testing(water_split)
set.seed(234)
water_folds <- vfold_cv(water_train, strata = status_id)
water_folds
# Model building
library(themis)
ranger_recipe <-
recipe(formula = status_id ~ ., data = water_train) %>%
update_role(row_id, new_role = "id") %>%
step_unknown(all_nominal_predictors()) %>%
step_other(all_nominal_predictors(), threshold = 0.03) %>%
step_impute_linear(install_year) %>%
step_downsample(status_id)
ranger_spec <-
rand_forest(trees = 1000) %>%
set_mode("classification") %>%
set_engine("ranger")
ranger_workflow <-
workflow() %>%
add_recipe(ranger_recipe) %>%
add_model(ranger_spec)
doParallel::registerDoParallel()
set.seed(74403)
ranger_rs <-
fit_resamples(ranger_workflow,
resamples = water_folds,
control = control_resamples(save_pred = TRUE)
)
Here is Julia's VIP code:这是 Julia 的 VIP 代码:
library(vip)
imp_data <- ranger_recipe %>%
prep() %>%
bake(new_data = NULL) %>%
select(-row_id)
ranger_spec %>%
set_engine("ranger", importance = "permutation") %>%
fit(status_id ~ ., data = imp_data) %>%
vip(geom = "point")
Julia's VIP w points朱莉娅的贵宾 w 积分
My Attempt:我的尝试:
ranger_spec %>%
set_engine("ranger", importance = "permutation") %>%
fit(status_id ~ ., data = imp_data) %>%
vip(pred_wrapper = predict, geom = "boxplot", nsim = 10, keep = TRUE)
However it continues to return this error:但是它继续返回此错误:
Error: To construct boxplots for permutation-based importance scores you must specify keep = TRUE
in the call vi()
or vi_permute()
.错误:要为基于排列的重要性分数构建箱线图,您必须在调用
vi()
或vi_permute()
中指定keep = TRUE
。 Additionally, you also need to set nsim >= 2
.此外,您还需要设置
nsim >= 2
。
Because I have done all of those things, I assume my error is with pred_wrapper, but I'm not sure.因为我已经完成了所有这些事情,所以我认为我的错误是 pred_wrapper,但我不确定。 What am I doing wrong here?
我在这里做错了什么?
Thanks ya'll!谢谢你们!
First, you may be interested in a resampling approach to estimating variable importance, where you yourself control the resampling and what gets extracted.首先,您可能对估计变量重要性的重采样方法感兴趣,您可以自己控制重采样以及提取的内容。
Second, I think something is not working quite right with method = "permutation"
for a tidymodels model.其次,我认为对于 tidymodels model 的
method = "permutation"
,有些东西不太正确。 I can't get it to work, but I can get the permutation importance for the underlying model:我无法让它工作,但我可以获得底层 model 的排列重要性:
library(vip)
imp_data <- ranger_recipe %>%
prep() %>%
bake(new_data = NULL) %>%
select(-row_id)
mod <- ranger::ranger(status_id ~ ., data = imp_data, classification = TRUE)
pred_fun = function(object, newdata) {
predict(object, newdata)$predictions
}
vip(mod, method = "permute",
train = imp_data, target = "status_id",
metric = "accuracy", pred_wrapper = pred_fun)
Created on 2022-09-02 with reprex v2.0.2使用reprex v2.0.2创建于 2022-09-02
Here is another resource for how to use vip , but you may want to look into using DALEX for permutation variable importance .这是有关如何使用 vip 的另一个资源,但您可能希望研究使用DALEX 来获得置换变量的重要性。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.