简体   繁体   中英

Mixed effects logistic regression

I'm attempting to implement mixed effects logistic regression in python. As a point of comparison, I'm using the glmer function from the lme4 package in R.

I've found that the statsmodels module has a BinomialBayesMixedGLM that should be able to fit such a model. However, I've encountered a number of issues:

  1. I find the documentation for the statsmodels function to be not entirely helpful or clear, so I'm not completely sure how to use the function appropriately.
  2. So far, my attempts have not produced results that replicate what I get when fitting the model with glmer in R.
  3. I expect the BinomialBayesMixedGLM function does not calculate p-values since it is Bayesian, but I can't seem to figure out how to access the full posterior distributions for the parameters.

As a test case, I'm using the titanic dataset available here .

import os
import pandas as pd
import statsmodels.genmod.bayes_mixed_glm as smgb

titanic = pd.read_csv(os.path.join(os.getcwd(), 'titanic.csv'))

r = {"Pclass": '0 + Pclass'}
mod = smgb.BinomialBayesMixedGLM.from_formula('Survived ~ Age', r, titanic)
fit = mod.fit_map()
fit.summary()

#           Type    Post. Mean  Post. SD       SD SD (LB) SD (UB)
# Intercept M           3.1623    0.3616            
# Age       M          -0.0380    0.0061            
# Pclass    V           0.0754    0.5669    1.078   0.347   3.351

However, aside from the slope for Age, this doesn't appear to match what I get in R with glmer(Survived ~ Age + (1 | Pclass), data = titanic, family = "binomial") :

Random effects:
 Groups Name        Variance Std.Dev.
 Pclass (Intercept) 0.8563   0.9254  
Number of obs: 887, groups:  Pclass, 3

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.961780   0.573402   1.677   0.0935 .  
Age         -0.038708   0.006243  -6.200 5.65e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So what error am I making when creating my model in python? And, once that's addressed, how can I extract either the posteriors or p-values? Lastly, are their any python implementations of mixed effect logistic regressions that are more akin to the implementation in R?

Just had to do something similar with Python , as suggested in the comments Pymer4 appeared to offer a suitable approach (especially if you are familiar with R anyway). Using the example dataset 'titanic' referred to in the question:

from pymer4.models import Lmer

model = Lmer("Survived  ~ Age  + (1|Pclass)",
             data=titanic, family = 'binomial')

print(model.fit())

OUT:

Formula: Survived~Age+(1|Pclass)

Family: binomial     Inference: parametric

Number of observations: 887  Groups: {'Pclass': 3.0}

Log-likelihood: -525.812     AIC: 1057.624

Random effects:

               Name    Var    Std
Pclass  (Intercept)  0.856  0.925

No random effect correlations specified

Fixed effects:

             Estimate  2.5_ci  97.5_ci     SE     OR  OR_2.5_ci  OR_97.5_ci  \
(Intercept)     0.962  -0.162    2.086  0.573  2.616       0.85       8.050   
Age            -0.039  -0.051   -0.026  0.006  0.962       0.95       0.974   

              Prob  Prob_2.5_ci  Prob_97.5_ci  Z-stat  P-val  Sig  
(Intercept)  0.723        0.460         0.889   1.677  0.093    .  
Age          0.490        0.487         0.493  -6.200  0.000  *** 

Just as an additional comment (sorry for diverting from the main question), I ran this on an Ubuntu 20.04 machine with Python 3.8.8 . Not sure if anyone else encountered this problem, but when running the model above with Pymer4 the package threw an error (Same error when I tried to reproduce a similar model form the Pymer4 documentation here ):

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

referring to this line in the /pymer4/models/Lmer.py package file:

--> 444         if design_matrix:

I fixed this by changing this to (not sure if this is the most elegant or safest approach, happy to be corrected here):

if design_matrix.any():

This appeared to get the package running and provide R-equivalent results in the few cases that I tested.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM