简体   繁体   中英

How do I create shap plot in R for GBM model?

I want to creat a shap plot for feature importance, for GBM model:

ctrlCV = trainControl(method = 'repeatedcv', repeats = 5 , number = 10 , classProbs = TRUE , savePredictions = TRUE, summaryFunction = twoClassSummary )

gbmFit = train(CR~., data = training_set,
               method = "gbm",
               metric="ROC",
               trControl = ctrlCV,
               tuneGrid = gbmGRID,
               verbose = FALSE)

however, all examples I found are for xgboost model, packages like SHAPforxgboost and shapr, not working for me. for example:

shap_values <- shap.values(xgb_model = gbm_fit, X_train = tarining_set)

produces and error:

error in `colnames<-`(`*tmp*`, value = c(colnames(x_train), "bias")) : attempt to set 'colnames' on an object with less than two dimensions

I need a plot like this:

在此处输入图像描述

How can I do that?

EDIT - my train set using dput():

structure(list(CR = c("nonComplete", "nonComplete", "nonComplete", 
"nonComplete", "nonComplete", "nonComplete", "nonComplete", "nonComplete", 
"nonComplete", "nonComplete"), gender = c(1, 0, 0, 0, 1, 0, 0, 
1, 0, 1), CD4.T.cells = c(-0.0741098696855045, -0.094401270881699, 
0.0410284948786532, -0.163302950330185, -0.0942478217207681, 
-0.167314411991775, -0.118272811489486, -0.0366277340916379, 
-0.0809646843667242, -0.140727850456348), CD8.T.cells = c(-0.178835447722468, 
-0.253897294559596, -0.0372301980787381, -0.230579110769457, 
-0.224125346052727, -0.196933050675633, -0.344608041139497, -0.0550538743643369, 
-0.276178546845023, -0.235047665605314), T.helpers = c(-0.0384421660291032, 
-0.0275306107582565, 0.186447606591857, -0.124972070102036, -0.15348122673842, 
-0.106812144494277, -0.104757782473888, 0.0686746776877563, -0.0729755869081981, 
-0.0783448555726869), NK.cells = c(-0.0924083910597563, -0.172356328661097, 
-0.0172673823614314, 0.0280649471541352, -0.128925304635747, 
-0.0875076743713435, -0.188649323737844, -0.0518877213975413, 
-0.184546079512101, -0.100562282085102), Monocytes = c(-0.0680848706469295, 
-0.173427291586957, -0.0106773958944477, -0.0015805672257001, 
-0.0751114943036091, -0.0737177243152751, -0.211297995211542, 
-0.0674023045286274, -0.149380203815874, -0.0352058106388986), 
    Neutrophils = c(-0.0391833488213571, -0.0275279418713283, 
    0.0156454755097513, 0.0285160860867748, -0.0633367938488132, 
    0.0252778805872529, -0.0827920017974784, 0.0432343965225797, 
    -0.0693846217599099, -0.0249227307025501), gd.T.Cells = c(-0.162246594987039, 
    -0.297759223265742, -0.0814825699645205, -0.0688779846190755, 
    -0.222281334925374, -0.264420103679214, -0.251924422671008, 
    -0.162709306032616, -0.292342418053931, -0.246818199922858
    ), Non.plasma.B.cells = c(-0.0384755654971015, -0.114370815587458, 
    0.161268251261644, -0.0571463865006043, -0.112851511342984, 
    -0.0822058328898433, -0.118367014322845, 0.114155959200915, 
    -0.0923514068231641, -0.115614038543851)), row.names = c("Pt1", 
"Pt10", "Pt101", "Pt103", "Pt106", "Pt11", "Pt17", "Pt18", "Pt26", 
"Pt27"), class = "data.frame")

I've faced this probelm before and for me it only worked for xgboost models. This should work for you, using the shapviz package:

library(shapviz)

shp = shapviz(model, X_pred = data.matrix(data[,-1]), X = data)
sv_waterfall(shp, row_id = 1)
sv_importance(shp, kind = 'beeswarm')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM