[英]SHAP Importance for Ranger in R
Having a binary Classification problem: how would be possible to get the Shap Contribution for variables for a Ranger model?有一个二元分类问题:如何获得 Ranger 模型变量的 Shap 贡献?
Sample data:样本数据:
library(ranger)
library(tidyverse)
# Binary Dataset
df <- iris
df$Target <- if_else(df$Species == "setosa",1,0)
df$Species <- NULL
# Train Ranger Model
model <- ranger(
x = df %>% select(-Target),
y = df %>% pull(Target))
I have tried with several libraries( DALEX
, shapr
, fastshap
, shapper
) but I didnt get any solution.我尝试了几个库( DALEX
、 shapr
、 fastshap
、 shapper
)但我没有得到任何解决方案。
I wish getting some result like SHAPforxgboost
for xgboost like:我希望得到一些像SHAPforxgboost
for xgboost 这样的结果:
shap.values
which is the shap contribution of variables shap.values
的输出,这是变量的形状贡献shap.plot.summary
shap.plot.summary
Good Morning,, According to what I have found, you can use ranger()
with fastshap() as following:早上好,根据我的发现,您可以将ranger()
与 fastshap() 一起使用,如下所示:
library(fastshap)
library(ranger)
library(tidyverse)
data(iris)
# Binary Dataset
df <- iris
df$Target <- if_else(df$Species == "setosa",1,0)
df$Species <- NULL
x <- df %>% select(-Target)
# Train Ranger Model
model <- ranger(
x = df %>% select(-Target),
y = df %>% pull(Target))
# Prediction wrapper
pfun <- function(object, newdata) {
predict(object, data = newdata)$predictions
}
# Compute fast (approximate) Shapley values using 10 Monte Carlo repetitions
system.time({ # estimate run time
set.seed(5038)
shap <- fastshap::explain(model, X = x, pred_wrapper = pfun, nsim = 10)
})
# Load required packages
library(ggplot2)
theme_set(theme_bw())
# Aggregate Shapley values
shap_imp <- data.frame(
Variable = names(shap),
Importance = apply(shap, MARGIN = 2, FUN = function(x) sum(abs(x)))
)
Then for example, for variable importance, you can do:然后例如,对于可变重要性,您可以执行以下操作:
# Plot Shap-based variable importance
ggplot(shap_imp, aes(reorder(Variable, Importance), Importance)) +
geom_col() +
coord_flip() +
xlab("") +
ylab("mean(|Shapley value|)")
Also, if you want individual predictions, the following is possible:此外,如果您想要单独的预测,以下是可能的:
# Plot individual explanations
expl <- fastshap::explain(model, X = x ,pred_wrapper = pfun, nsim = 10, newdata = x[1L, ])
autoplot(expl, type = "contribution")
All this information has been found in here, and there is more to it: https://bgreenwell.github.io/fastshap/articles/fastshap.html Check the link and solve your doubts: :)所有这些信息都在这里找到,还有更多信息: https://bgreenwell.github.io/fastshap/articles/fastshap.ZFC35FDC70D5FC69D269883A822C7)检查链接并解决您的疑问:
I launched two R packages to perform such tasks: One is "kernelshap" (crunching), the other one is "shapviz" (plotting).我启动了两个 R 包来执行此类任务:一个是“kernelshap”(处理),另一个是“shapviz”(绘图)。
library(randomForest)
library(kernelshap)
Ilibrary(shapviz)
set.seed(1)
fit <- randomForest(Sepal.Length ~ ., data = iris,)
# bg_X is usually a small (50-200 rows) subset of the data
# Step 1: Calculate Kernel SHAP values
s <- kernelshap(fit, iris[-1], bg_X = iris)
# Step 2: Turn them into a shapviz object
sv <- shapviz(s)
# Step 3: Gain insights...
sv_importance(sv, show_numbers = TRUE)
sv_dependence(sv, v = "Petal.Length", color_var = "auto")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.