I'm attempting to implement mixed effects logistic regression in python. As a point of comparison, I'm using the glmer
function from the lme4
package in R.
I've found that the statsmodels
module has a BinomialBayesMixedGLM
that should be able to fit such a model. However, I've encountered a number of issues:
statsmodels
function to be not entirely helpful or clear, so I'm not completely sure how to use the function appropriately.glmer
in R.BinomialBayesMixedGLM
function does not calculate p-values since it is Bayesian, but I can't seem to figure out how to access the full posterior distributions for the parameters.As a test case, I'm using the titanic dataset available here .
import os
import pandas as pd
import statsmodels.genmod.bayes_mixed_glm as smgb
titanic = pd.read_csv(os.path.join(os.getcwd(), 'titanic.csv'))
r = {"Pclass": '0 + Pclass'}
mod = smgb.BinomialBayesMixedGLM.from_formula('Survived ~ Age', r, titanic)
fit = mod.fit_map()
fit.summary()
# Type Post. Mean Post. SD SD SD (LB) SD (UB)
# Intercept M 3.1623 0.3616
# Age M -0.0380 0.0061
# Pclass V 0.0754 0.5669 1.078 0.347 3.351
However, aside from the slope for Age, this doesn't appear to match what I get in R with glmer(Survived ~ Age + (1 | Pclass), data = titanic, family = "binomial")
:
Random effects:
Groups Name Variance Std.Dev.
Pclass (Intercept) 0.8563 0.9254
Number of obs: 887, groups: Pclass, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.961780 0.573402 1.677 0.0935 .
Age -0.038708 0.006243 -6.200 5.65e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So what error am I making when creating my model in python? And, once that's addressed, how can I extract either the posteriors or p-values? Lastly, are their any python implementations of mixed effect logistic regressions that are more akin to the implementation in R?
Just had to do something similar with Python
, as suggested in the comments Pymer4
appeared to offer a suitable approach (especially if you are familiar with R
anyway). Using the example dataset 'titanic' referred to in the question:
from pymer4.models import Lmer
model = Lmer("Survived ~ Age + (1|Pclass)",
data=titanic, family = 'binomial')
print(model.fit())
OUT:
Formula: Survived~Age+(1|Pclass)
Family: binomial Inference: parametric
Number of observations: 887 Groups: {'Pclass': 3.0}
Log-likelihood: -525.812 AIC: 1057.624
Random effects:
Name Var Std
Pclass (Intercept) 0.856 0.925
No random effect correlations specified
Fixed effects:
Estimate 2.5_ci 97.5_ci SE OR OR_2.5_ci OR_97.5_ci \
(Intercept) 0.962 -0.162 2.086 0.573 2.616 0.85 8.050
Age -0.039 -0.051 -0.026 0.006 0.962 0.95 0.974
Prob Prob_2.5_ci Prob_97.5_ci Z-stat P-val Sig
(Intercept) 0.723 0.460 0.889 1.677 0.093 .
Age 0.490 0.487 0.493 -6.200 0.000 ***
Just as an additional comment (sorry for diverting from the main question), I ran this on an Ubuntu 20.04
machine with Python 3.8.8
. Not sure if anyone else encountered this problem, but when running the model above with Pymer4
the package threw an error (Same error when I tried to reproduce a similar model form the Pymer4
documentation here ):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
referring to this line in the /pymer4/models/Lmer.py package file:
--> 444 if design_matrix:
I fixed this by changing this to (not sure if this is the most elegant or safest approach, happy to be corrected here):
if design_matrix.any():
This appeared to get the package running and provide R-equivalent results in the few cases that I tested.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.