简体   繁体   中英

python how can I convert shap values to probability increase/decreases?

Issue on shap's repo: https://github.com/slundberg/shap/issues/2783

So currently, I know how to convert the base (expected) value from log odds to probability, with

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_train)
odds = np.exp(explainer.expected_value)
odds / (1 + odds)

This works fine, but the problem comes when I try and convert each individual shap value to a probability increase/decrease. That formula doesn't work, so I'm wondering how I can get the percent increase/decrease that each feature contributes

在此处输入图像描述

Basically, what percent do each of the lengths (like the length I annotated in red on the picture) take up?

I'm looking for a discrete number that corresponds to the percent increase/decrease for the bar of each feature (in probability, not log odds)

# this generates the plot
shap.force_plot(
    explainer.expected_value,
    shap_values[1, :],
    X_train.iloc[1, :],
    link='logit'
)

I think I may have found the answer, not sure if it's correct because the only way to compare is by visual approximation, but here is what I came up with. If anyone could try it out and determine is the calculation is off, that would be amazing!

First, we have to create a helper function to convert log odds to probabilities

def lo_to_prob(x):
    odds = np.exp(x)
    return odds / (1 + odds)

Set up our shap

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_train)

This is the conversion formula I found

# this is the row that we want to find the shap outcomes for
observation = 0

# this is the column that we want to find the percent change for
column_num = 2

# this formula gives us the outcome probability number
shap_outcome_val = lo_to_prob(
    explainer.expected_value + shap_values[observation, :].sum()
)

# this give us the shap outcome value, without the single column
shap_outcome_minus_one_column = lo_to_prob(
    explainer.expected_value + shap_values[observation, :].sum() - shap_values[observation, :][column_num]
)

# simply subtract the 2 probabilities to get the pct increase that the one column provides
pct_change = shap_outcome_val - shap_outcome_minus_one_column
pct_change

Check the graph to see if the length of the bar of the column we're interested in, is about the length that we get from the calculation

shap.force_plot(
    explainer.expected_value,
    shap_values[observation, :],
    X_train.iloc[observation, :],
    link='logit'
)

Again, not sure if this is 100% correct, as the only way to verify is visually. It looks close tho. Try it out and let me know

To get the probability from the SHAP Explainer, you can do this:

explainer = shap.TreeExplainer(model, data=train, model_output='probability')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM