简体   繁体   中英

Logistic Regression different results with R and Python?

I used a logistic regression approach in both programs, and was wondering why I am getting different results, especially with the coefficients. The outcome, Infection, is (1, 0) and Flushed is a continuous variable.

Python:

import statsmodels.api as sm
logit_model=sm.Logit(data['INFECTION'], data['Flushed'])
result=logit_model.fit()
print(result.summary())

Results:

                           Logit Regression Results                           
==============================================================================
Dep. Variable:              INFECTION   No. Observations:                  414
Model:                          Logit   Df Residuals:                      413
Method:                           MLE   Df Model:                            0
Date:                Fri, 24 Aug 2018   Pseudo R-squ.:                  -1.388
Time:                        15:47:42   Log-Likelihood:                -184.09
converged:                       True   LL-Null:                       -77.104
                                        LLR p-value:                       nan
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Flushed       -0.6467      0.070     -9.271      0.000      -0.783      -0.510
==============================================================================

R:

mylogit <- glm(INFECTION ~ Flushed, data = cvc, family = "binomial")
summary(mylogit)

Results:

Call:
glm(formula = INFECTION ~ Flushed, family = "binomial", data = cvc)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.0598  -0.3107  -0.2487  -0.2224   2.8051  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.91441    0.38639 -10.131  < 2e-16 ***
Flushed      0.22696    0.06049   3.752 0.000175 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

You seem to be missing the constant (offset) parameter in the Python logistic model.

To use R's formula syntax you're fitting two different models:

Python model: INFECTION ~ 0 + Flushed
R model     : INFECTION ~ Flushed

To add a constant to the Python model use sm.add_constant(...) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM