简体   繁体   中英

How to decide when and how to include covariates in a linear mixed-effects model in lme4

I am running a linear mixed-effects model in R, and I'm not sure how to include a covariate of no interest in the model, or even how to decide if I should do that.

I have two within-subject variables, let's call them A and B with two levels each, with lots of observations per participant. I'm interested in how their interaction changes across 4 groups. My outcome is reaction time. At the simplest level, I have this model:

RT ~ 1 + A*B*Groups + (1+A | Subject ID)

I would like to add Gender as a covariate of no interest. I have no theoretical reason to assume it affects anything, but it's really imbalanced across groups, so I'd like to include it. The first part of my question is: What is the best way to do this?

Is it this model:

RT ~ 1 + A*B*Groups + Gender + (1+A | Subject ID)

or this:

RT ~ 1 + A*B*Groups*Gender + (1+A | Subject ID)

? Or some other way? My worries about this second model is that it somewhat unreasonably inflates the number of terms in the model. Plus I'm worried about overfitting.

The second part of my question: When selecting the best model, when should I add the covariate to see if it makes any difference at all? Let me explain what I mean.

Let's say I start with the simplest model I mentioned above, but without the slope for A, so this:

RT ~ 1 + A*B*Groups + (1| Subject ID)

Should I add the covariate first, either as a main effect ( + Gender) or as part of the interaction (*Gender), and then see if adding a slope for A makes a difference (by using the anova() function), or can I go ahead with adding the slope (which is theoretically more important) first, and then see if gender matters at all?

Following are some suggestions regarding your two questions.

  1. I would recommend an iterative modelling strategy.

    Start with

     RT ~ 1 + A*B*Groups*Gender + (1+A | Subject ID) 

    and see if the problem is tractable. Above model will include both additive effects as well as all interaction terms between A , B , Groups and Gender .

    If the problem is not tractable, discard the interaction terms between Gender and the other covariates, and model

     RT ~ 1 + A*B*Groups + Gender + (1+A | Subject ID) 

    It's difficult to make a statement about potential overfitting without any details on the number of observations.

  2. Concerning your second question: Generally, I would recommend a Bayesian approach; take a look at the rstan -based brms R package, which allows you to use the same lme4 / glmm formula syntax, making it easy to translate models. Model comparison and predictive performance are very broad terms. There exist various ways to explore and compare the predictive performance of these type of nested/hierarchical Bayesian models. See for example the papers by Piironi and Vehtari and Vehtari and Ojanen .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM