Issue on shap's repo: https://github.com/slundberg/shap/issues/2783
So currently, I know how to convert the base (expected) value from log odds to probability, with
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_train)
odds = np.exp(explainer.expected_value)
odds / (1 + odds)
This works fine, but the problem comes when I try and convert each individual shap value to a probability increase/decrease. That formula doesn't work, so I'm wondering how I can get the percent increase/decrease that each feature contributes
Basically, what percent do each of the lengths (like the length I annotated in red on the picture) take up?
I'm looking for a discrete number that corresponds to the percent increase/decrease for the bar of each feature (in probability, not log odds)
# this generates the plot
shap.force_plot(
explainer.expected_value,
shap_values[1, :],
X_train.iloc[1, :],
link='logit'
)
I think I may have found the answer, not sure if it's correct because the only way to compare is by visual approximation, but here is what I came up with. If anyone could try it out and determine is the calculation is off, that would be amazing!
First, we have to create a helper function to convert log odds to probabilities
def lo_to_prob(x):
odds = np.exp(x)
return odds / (1 + odds)
Set up our shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_train)
This is the conversion formula I found
# this is the row that we want to find the shap outcomes for
observation = 0
# this is the column that we want to find the percent change for
column_num = 2
# this formula gives us the outcome probability number
shap_outcome_val = lo_to_prob(
explainer.expected_value + shap_values[observation, :].sum()
)
# this give us the shap outcome value, without the single column
shap_outcome_minus_one_column = lo_to_prob(
explainer.expected_value + shap_values[observation, :].sum() - shap_values[observation, :][column_num]
)
# simply subtract the 2 probabilities to get the pct increase that the one column provides
pct_change = shap_outcome_val - shap_outcome_minus_one_column
pct_change
Check the graph to see if the length of the bar of the column we're interested in, is about the length that we get from the calculation
shap.force_plot(
explainer.expected_value,
shap_values[observation, :],
X_train.iloc[observation, :],
link='logit'
)
Again, not sure if this is 100% correct, as the only way to verify is visually. It looks close tho. Try it out and let me know
To get the probability from the SHAP Explainer, you can do this:
explainer = shap.TreeExplainer(model, data=train, model_output='probability')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.