简体   繁体   中英

Multivariate linear mixed effects model in Python

I am playing around with this code which is for Univariate linear mixed effects modelling. The data set denotes:

  • students as s
  • instructors as d
  • departments as dept
  • service as service

In the syntax of R's lme4 package (Bates et al., 2015), the model implemented can be summarized as:

y ~ 1 + (1|students) + (1|instructor) + (1|dept) + service

where 1 denotes an intercept term,(1|x) denotes a random effect for x, and x denotes a fixed effect.

    from __future__ import absolute_import
    from __future__ import division
    from __future__ import print_function

    import edward as ed
    import pandas as pd
    import tensorflow as tf
    import matplotlib.pyplot as plt

    from edward.models import Normal
    from observations import insteval

    data = pd.DataFrame(data, columns=metadata['columns'])
    train = data.sample(frac=0.8)
    test = data.drop(train.index)
    train.head()

在此处输入图片说明

    s_train = train['s'].values
    d_train = train['dcodes'].values
    dept_train = train['deptcodes'].values
    y_train = train['y'].values
    service_train = train['service'].values
    n_obs_train = train.shape[0]

    s_test = test['s'].values
    d_test = test['dcodes'].values
    dept_test = test['deptcodes'].values
    y_test = test['y'].values
    service_test = test['service'].values
    n_obs_test = test.shape[0]
    n_s = max(s_train) + 1  # number of students
    n_d = max(d_train) + 1  # number of instructors
    n_dept = max(dept_train) + 1  # number of departments
    n_obs = train.shape[0]  # number of observations

    # Set up placeholders for the data inputs.
    s_ph = tf.placeholder(tf.int32, [None])
    d_ph = tf.placeholder(tf.int32, [None])
    dept_ph = tf.placeholder(tf.int32, [None])
    service_ph = tf.placeholder(tf.float32, [None])

    # Set up fixed effects.
    mu = tf.get_variable("mu", [])
    service = tf.get_variable("service", [])

    sigma_s = tf.sqrt(tf.exp(tf.get_variable("sigma_s", [])))
    sigma_d = tf.sqrt(tf.exp(tf.get_variable("sigma_d", [])))
    sigma_dept = tf.sqrt(tf.exp(tf.get_variable("sigma_dept", [])))

    # Set up random effects.
    eta_s = Normal(loc=tf.zeros(n_s), scale=sigma_s * tf.ones(n_s))
    eta_d = Normal(loc=tf.zeros(n_d), scale=sigma_d * tf.ones(n_d))
    eta_dept = Normal(loc=tf.zeros(n_dept), scale=sigma_dept * tf.ones(n_dept))

    yhat = (tf.gather(eta_s, s_ph) +
            tf.gather(eta_d, d_ph) +
            tf.gather(eta_dept, dept_ph) +
            mu + service * service_ph)
    y = Normal(loc=yhat, scale=tf.ones(n_obs))

    #Inference

    q_eta_s = Normal(
        loc=tf.get_variable("q_eta_s/loc", [n_s]),
        scale=tf.nn.softplus(tf.get_variable("q_eta_s/scale", [n_s])))
    q_eta_d = Normal(
        loc=tf.get_variable("q_eta_d/loc", [n_d]),
        scale=tf.nn.softplus(tf.get_variable("q_eta_d/scale", [n_d])))
    q_eta_dept = Normal(
        loc=tf.get_variable("q_eta_dept/loc", [n_dept]),
        scale=tf.nn.softplus(tf.get_variable("q_eta_dept/scale", [n_dept])))

    latent_vars = {
        eta_s: q_eta_s,
        eta_d: q_eta_d,
        eta_dept: q_eta_dept}
    data = {
        y: y_train,
        s_ph: s_train,
        d_ph: d_train,
        dept_ph: dept_train,
        service_ph: service_train}
    inference = ed.KLqp(latent_vars, data)

This works fine in the univariate case for Linear mixed effects modelling. I am trying to extend this approach to the multivariate case. Any ideas are more than welcome.

There are a number of ways to conduct linear mixed effects models in Python. It looks like you've adapted the Tensorflow approach but if that is not a hard requirement then there are several other potentially more convenient options.

  1. You can use the Statsmodels implementation of LMER which is conveniently contained in Python but the syntax is a bit different from traditional formulaic expressions from R's LMER. It looks like you are using python to split your data to training and test sets so you can also write a loop to call the

  2. You can also install R and rpy2 on your local machine and call the LMER packages from your Python environment. This allows you to keep your familiarity with working in R but allows you to do everything else in Python. All you have to do is use the rmagic %%R or (%R for inline) in your cell block in Jupyter Notebooks to pass variables and models between Python and R. The latter would be useful if you are passing the train/test data you split in Python to R to run lmer and retrieve the parameters back in a loop.

  3. Lastly, another option is to use Pymer4 which is a wrapper for rpy2 allowing you to directly call LMER in R but without having to deal with rmagic.

I wrote a tutorial on how to use LMER with each of these methods which also works on Cloud setups like Google Colab. These methods will all allow you to run the multivariate approach like you asked for using the LMER in R but from a Python environment.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM