简体   繁体   中英

Python and SPSS giving different output for Logistic Regression

Code:

from sklearn.linear_model import LogisticRegression
l = LogisticRegression()
b = l.fit(XT,Y)
    print "coeff ",b.coef_
    print "intercept ",b.intercept_

Here's the dataset

XT =
[[23]
 [24]
 [26]
 [21]
 [29]
 [31]
 [27]
 [24]
 [22]
 [23]]
Y = [1 0 1 0 0 1 1 0 1 0]

Result:

coeff  [[ 0.00850441]]
intercept  [-0.15184511

Now I added the same data in spss.Analyse->Regression->Binary Logistic Regression. I set the corresponding Y -> dependent and XT -> Covariates. The results weren't even close. Am I missing something in python or SPSS? SPSS二元逻辑回归结果 Python-Sklearn

Solved it myself. I tried changing the C-value in LinearRegression(C=100) . That did the trick. C=1000 got the result closest to SPSS and textbook result.

Hope this helps anyone who face any problem with LogisticRegression in python .

SPSS Logistic regression does not include parameter regularisation in it's cost function, it just does 'raw' logistic regression. In regularisation, the cost function includes a regularisation expression to prevent overfitting. You specify the inverse of this with the C value. If you set C to a very high value, it will closely mimic SPSS, so there is no magic number - just set it as high as you can, and there will be no regularisation.

With sklearn you can also "turn off" the regularization by setting the penalty to None . Then, no regularization will be applied. This will provide similar results for the logistic regression in sklearn compared to SPSS.

An example of a logistic regression from sklearn with 1000 iterations and no penalty is:

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(max_iter=1000, penalty='none')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM