简体   繁体   中英

Get marginal effects for sklearn logistic regression

I want to get the marginal effects of a logistic regression from a sklearn model

I know you can get these for a statsmodel logistic regression using '.get_margeff()'. Is there nothing for sklearn? I want to avoid doing the calculation my self as I feel there would be a lot of room for error.

import statsmodels.formula.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np

data = load_breast_cancer()
x = data.data
y= data.target
x=add_constant(x,has_constant='add')

model = sm.Logit(y, x).fit_regularized()
margeff = model.get_margeff(dummy=True,count=True)
##print the margal effect
print(margeff.margeff)
>> [ 6.73582136e-02  2.15779589e-04  1.28857837e-02 -1.06718136e-03
 -1.96032750e+00  1.36137385e+00 -1.16303369e+00 -1.37422595e+00
  8.14539021e-01 -1.95330095e+00 -4.86235558e-01  4.84260993e-02
  7.16675627e-02 -2.89644712e-03 -5.18982198e+00 -5.93269894e-01
  3.22934080e+00 -1.28363008e+01  3.07823155e+00  5.84122170e+00
  1.92785670e-02 -9.86284081e-03 -7.53298463e-03 -3.52349287e-04
  9.13527446e-01  1.69938656e-01 -2.89245493e-01 -4.65659522e-01
 -8.32713335e-01 -1.15567833e+00]


# manual calculation, doing this as you can get the coef_ from a sklearn model and use in the function

def PDF(XB):
    var1 = np.exp(XB)
    var2 = np.power((1+np.exp(XB)),2)
    var3 = (var1 / var2) 
    return var3
arrPDF = PDF(np.dot(x,model.params))
ME=pd.DataFrame(np.dot(arrPDF[:,None],model.params[None,:]))
print(ME.iloc[:,1:].mean().to_list())

>>
[0.06735821358791198, 0.0002157795887363032, 0.012885783711597246, -0.0010671813611730326, -1.9603274961356965, 1.361373851981879, -1.1630336876543224, -1.3742259536619654, 0.8145390210646809, -1.9533009514684947, -0.48623555805230195, 0.04842609927469917, 0.07166756271689229, -0.0028964471200298475, -5.189821981601878, -0.5932698935239838, 3.229340802910038, -12.836300822253634, 3.0782315528664834, 5.8412217033605245, 0.019278567008384557, -0.009862840813512401, -0.007532984627259091, -0.0003523492868714151, 0.9135274456151128, 0.16993865598225097, -0.2892454926120402, -0.46565952159093893, -0.8327133347971125, -1.1556783345783221]

the custom function gives the same as " .get_margeff() " but there might be a lot of room for error when using the sklearn ceof_ in the custom function above.

  1. Is there some method/function/Attribute in sklearn that can give me the marginal effects
  2. If there is not, is there another library get from the ceof_ and data to the marginal effects
  3. if the answer to both the above is no, are there any circumstances in which the custom function will not work (eg with a particular solver or penalty in sklearn)

I just hit this demand a few days ago.

My supervisor gave me this information that I want to share. Hope this can help you.

partial_dependence : This method can get the partial dependence or marginal effects you meant.

plot_partial_dependence : This method can plot the partial dependence .

Here is the sample code from the API Reference.

scikit-learn version: 0.21.2

from sklearn.inspection import plot_partial_dependence, partial_dependence
from sklearn.datasets import make_friedman1
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import GradientBoostingRegressor


%matplotlib inline


X, y = make_friedman1()

# case1: linear model
lm = LinearRegression().fit(X, y)
# plot the partial dependence
plot_partial_dependence(lm, X, [0, (0, 1)])
# get the partial dependence
partial_dependence(lm, X, [0])

# case2: classifier
clf = GradientBoostingRegressor(n_estimators=10).fit(X, y)
# plot the partial dependence
plot_partial_dependence(clf, X, [0, (0, 1)])
# get the partial dependence
partial_dependence(clf, X, [0])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM