sklearn 和 statsmodels 的逻辑回归结果不匹配

Question

I tried to do logistic regression using both sklearn and statsmodels libraries.我尝试使用 sklearn 和 statsmodels 库进行逻辑回归。 Their result is close, but not the same.他们的结果很接近，但又不一样。 For example, the (slope, intercept) pair obtained by sklearn is (-0.84371207, 1.43255005), while the pair obtained by statsmodels is (-0.8501, 1.4468).比如sklearn得到的(slope,intercept)pair是(-0.84371207, 1.43255005)，而statsmodels得到的pair是(-0.8501, 1.4468)。 Why and how to make them same?为什么以及如何使它们相同？

import pandas as pd
import statsmodels.api as sm
from sklearn import linear_model

# Part I: sklearn logistic

url = "https://github.com/pcsanwald/kaggle-titanic/raw/master/train.csv"
titanic_train = pd.read_csv(url)

train_X = pd.DataFrame([titanic_train["pclass"]]).T
train_Y = titanic_train["survived"]

model_1 = linear_model.LogisticRegression(solver = 'lbfgs')
model_1.fit(train_X, train_Y)

print(model_1.coef_) # print slopes
print(model_1.intercept_ ) # print intercept

# Part II: statsmodels logistic

train_X['intercept'] = 1
model_2=sm.Logit(train_Y,train_X, method='lbfgs')
result=model_2.fit()
print(result.summary2())

Answer 1

Sklearn uses L2 regularisation by default and statsmodels does not. Sklearn 默认使用 L2 正则化，而 statsmodels 不使用。 Try specifying penalty= 'none' in the sklearn model parameters and rerun.尝试在 sklearn 模型参数中指定penalty= 'none'并重新运行。

See the documentation for more information on logistic regression in sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html .有关 sklearn 中逻辑回归的更多信息，请参阅文档： https ://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html。

sklearn 和 statsmodels 的逻辑回归结果不匹配

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-11-14 13:44:55

sklearn 和 statsmodels 的逻辑回归结果不匹配

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-11-14 13:44:55

解决方案1
2 已采纳 2021-11-14 13:44:55