简体   繁体   English

如何确定何时以及如何在lme4的线性混合效应模型中包括协变量

[英]How to decide when and how to include covariates in a linear mixed-effects model in lme4

I am running a linear mixed-effects model in R, and I'm not sure how to include a covariate of no interest in the model, or even how to decide if I should do that. 我正在R中运行线性混合效果模型,但不确定如何在模型中包含不感兴趣的协变量,甚至不确定如何确定是否应该这样做。

I have two within-subject variables, let's call them A and B with two levels each, with lots of observations per participant. 我有两个主题内变量,我们将它们分别称为A和B,每个变量具有两个级别,每个参与者有很多观察结果。 I'm interested in how their interaction changes across 4 groups. 我对他们的互动在4个小组中如何变化感兴趣。 My outcome is reaction time. 我的结果是反应时间。 At the simplest level, I have this model: 在最简单的层次上,我有以下模型:

RT ~ 1 + A*B*Groups + (1+A | Subject ID)

I would like to add Gender as a covariate of no interest. 我想将“性别”添加为没有兴趣的协变量。 I have no theoretical reason to assume it affects anything, but it's really imbalanced across groups, so I'd like to include it. 我没有理论上的理由认为它会影响任何事情,但是它在各个群体之间确实是不平衡的,因此我想将其包括在内。 The first part of my question is: What is the best way to do this? 我的问题的第一部分是:做到这一点的最佳方法是什么?

Is it this model: 是这个模型吗?

RT ~ 1 + A*B*Groups + Gender + (1+A | Subject ID)

or this: 或这个:

RT ~ 1 + A*B*Groups*Gender + (1+A | Subject ID)

? Or some other way? 还是其他方式? My worries about this second model is that it somewhat unreasonably inflates the number of terms in the model. 我担心第二个模型是因为它在某种程度上不合理地夸大了模型中的项数。 Plus I'm worried about overfitting. 另外,我担心过度拟合。

The second part of my question: When selecting the best model, when should I add the covariate to see if it makes any difference at all? 问题的第二部分:选择最佳模型时,应何时添加协变量以查看是否有任何区别? Let me explain what I mean. 让我解释一下我的意思。

Let's say I start with the simplest model I mentioned above, but without the slope for A, so this: 假设我从上面提到的最简单的模型开始,但是没有A的斜率,因此:

RT ~ 1 + A*B*Groups + (1| Subject ID)

Should I add the covariate first, either as a main effect ( + Gender) or as part of the interaction (*Gender), and then see if adding a slope for A makes a difference (by using the anova() function), or can I go ahead with adding the slope (which is theoretically more important) first, and then see if gender matters at all? 我应该先添加协变量,将其作为主要效果(+性别)还是作为交互的一部分(* Gender), 然后查看是否为A添加斜率会有所不同(通过使用anova()函数),或者我可以先增加斜率(理论上更重要),然后再看性别是否重要?

Following are some suggestions regarding your two questions. 以下是有关您的两个问题的一些建议。

  1. I would recommend an iterative modelling strategy. 我会推荐一种迭代建模策略。

    Start with 从...开始

     RT ~ 1 + A*B*Groups*Gender + (1+A | Subject ID) 

    and see if the problem is tractable. 看看问题是否可以解决。 Above model will include both additive effects as well as all interaction terms between A , B , Groups and Gender . 上面的模型将包括加性效应以及ABGroupsGender之间A 所有交互项。

    If the problem is not tractable, discard the interaction terms between Gender and the other covariates, and model 如果问题无法解决,请丢弃Gender与其他协变量之间的相互作用项,然后进行建模

     RT ~ 1 + A*B*Groups + Gender + (1+A | Subject ID) 

    It's difficult to make a statement about potential overfitting without any details on the number of observations. 如果没有观察数的任何细节,就很难做出关于潜在过度拟合的陈述。

  2. Concerning your second question: Generally, I would recommend a Bayesian approach; 关于您的第二个问题:通常,我建议贝叶斯方法。 take a look at the rstan -based brms R package, which allows you to use the same lme4 / glmm formula syntax, making it easy to translate models. 看一下基于rstanbrms R软件包,它使您可以使用相同的lme4 / glmm公式语法,从而使转换模型变得容易。 Model comparison and predictive performance are very broad terms. 模型比较和预测性能是非常广泛的术语。 There exist various ways to explore and compare the predictive performance of these type of nested/hierarchical Bayesian models. 存在多种方式来探索和比较这些类型的嵌套/分层贝叶斯模型的预测性能。 See for example the papers by Piironi and Vehtari and Vehtari and Ojanen . 例如,参见Piironi和Vehtari以及Vehtari和Ojanen的论文

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM