简体   繁体   English

R中的重复测量/受试者内ANOVA

[英]Repeated-measures / within-subjects ANOVA in R

I'm attempting to run a repeated-meaures ANOVA using R. I've gone through various examples on various websites, but they never seem to talk about the error that I'm encountering. 我正在尝试使用R运行重复测量方差分析。我已经在各种网站上浏览了各种各样的例子,但他们似乎从来没有谈过我遇到的错误。 I assume I'm misunderstanding something important. 我认为我误解了一些重要的事情。

The ANOVA I'm trying to run is on some data from an experiment using human participants. 我试图运行的ANOVA是来自使用人类参与者的实验的一些数据。 It has one DV and three IVs. 它有一个DV和三个IV。 All of the levels of all of the IVs are run on all participants, making it a three-way repeated-measures / within-subjects ANOVA. 所有IV的所有水平都在所有参与者上运行,使其成为三向重复测量/受试者内部ANOVA。

The code I'm running in R is as follows: 我在R中运行的代码如下:

aov.output = aov(DV~ IV1 * IV2 * IV3 + Error(PARTICIPANT_ID / (IV1 * IV2 * IV3)),
                 data=fulldata)

When I run this, I get the following warning: 当我运行它时,我收到以下警告:

Error() model is singular

Any ideas what I might be doing wrong? 我有什么想法可能做错了吗?

Try using the lmer function in the lme4 package. 尝试使用lme4包中的lmer函数。 The aov function is probably not appropriate here. 这里的aov函数可能不合适。 Look for references from Dougles Bates, eg http://lme4.r-forge.r-project.org/book/Ch4.pdf (the other chapters are great too, but that is the repeated measures chapter, this is the intro: http://lme4.r-forge.r-project.org/book/Ch1.pdf ). 寻找Dougles Bates的参考资料,例如http://lme4.r-forge.r-project.org/book/Ch4.pdf (其他章节也很棒,但这是重复的措施章节,这是介绍: http://lme4.r-forge.r-project.org/book/Ch1.pdf )。 The R code is at the same place and for longitudinal data, it seems to be generally considered wrong these days to just fit OLS instead of a components of variance model like in the lme4 package, or in nlme, which to me seems to have been wildly overtaken by lme4 in popularity recently. R代码在同一个地方,对于纵向数据,这些天似乎通常认为只是适合OLS而不是像lme4包中的方差模型的组件,或者在nlme中,这对我来说似乎是错误的Lme4最近大受欢迎。 You may note Brian Ripley's referenced post in the comments section above just recommends switching to lme also. 您可能会注意到Brian Ripley在上面评论部分中引用的帖子只是建议切换到lme。

By the way, a huge advantage off the jump is you will be able to get estimates for the level of each effect as adjustments to the grand mean with the typical syntax: 顺便说一下,跳跃的一个巨大优势是你可以通过典型语法调整每个效果的水平作为对平均值的调整:

lmer(DV ~ 1  +IV1*IV2*IV3 +(IV1*IV2*IV3|Subject), dataset))

Note your random effects will be vector valued. 请注意,您的随机效果将是矢量值。

I know the answer has been chosen for this post. 我知道这篇文章已经选择了答案。 I still wish to point out how to specify a correct error term/random effect when fitting a aov or lmer model to a multi-way repeated-measures data. 我仍然希望指出在将aovlmer模型拟合到多向重复测量数据时如何指定正确的误差项/随机效应。 I assume that both independent variables (IVs) are fixed, and are crossed with each other and with subjects, meaning all subjects are exposed to all combinations of the IVs. 我假设两个自变量(IVs)是固定的,并且彼此交叉并与受试者交叉,这意味着所有受试者都暴露于IV的所有组合。 I am going to use data taken from Kirk's Experimental Deisign: Procedures for the Behavioral Sciences (2013). 我将使用从Kirk的实验性设计:行为科学程序 (2013)中获取的数据。

library(lme4)
library(foreign)
library(lmerTest)
library(dplyr)

file_name <- "http://www.ats.ucla.edu/stat/stata/examples/kirk/rbf33.dta" #1
d <- read.dta(file_name) %>%                                              #2
  mutate(a_f = factor(a), b_f = factor(b), s_f = factor(s))               #3

head(d)
    ##   a b s  y a_f b_f s_f
    ## 1 1 1 1 37   1   1   1
    ## 2 1 2 1 43   1   2   1
    ## 3 1 3 1 48   1   3   1
    ## 4 2 1 1 39   2   1   1
    ## 5 2 2 1 35   2   2   1

In this study 5 subjects (s) are exposed to 2 treatments - type of beat (a) and training duration (b) - with 3 levels each. 在该研究中,5名受试者接受2次治疗 - 搏动类型(a)和训练持续时间(b) - 每次3个级别。 The outcome variable is the attitude toward minority. 结果变量是对少数群体的态度。 In #3 I made a, b, and s into factor variables, a_f, b_f, and s_f. 在#3中,我将a,b和s变为因子变量a_f,b_f和s_f。 Let p and q be the numbers of levels for a_f and b_f (3 each), and n be the sample size (5). pq为a_f和b_f(每个3)的级别数, n为样本大小(5)。

In this example the degrees of freedom ( dfs ) for the tests of a_f, b_f, and their interaction should be p -1=2, q -1=2, and ( p -1)*( q -1)=4, respectively. 在这个例子中,a_f,b_f及其相互作用的测试的自由度( dfs )应该是p -1 = 2, q -1 = 2,并且( p -1)*( q -1)= 4,分别。 The df for the s_f error term is ( n -1) = 4, and the df for the Within (s_f:a_f:b_f) error term is ( n -1)( pq -1)=32. s_f错误项的df是( n -1)= 4,而within(s_f:a_f:b_f)错误项的df是( n -1)( pq -1)= 32。 So the correct model(s) should give you these dfs . 所以正确的模型应该给你这些dfs

Using aov 使用aov

Now let's try different model specifications using aov : 现在让我们使用aov尝试不同的模型规范:

aov(y ~ a_f*b_f + Error(s_f), data=d) %>% summary()         # m1

aov(y ~ a_f*b_f + Error(s_f/a_f:b_f), data=d) %>% summary() # m2

aov(y ~ a_f*b_f + Error(s_f/a_f*b_f), data=d) %>% summary() # m3

Simply specifying the error as Error(s_f) in m1 gives you the correct dfs and F-ratios matching the values in the book. 只需在m1中将Error(s_f)指定为Error(s_f) ,即可获得与书中值相匹配的正确dfs和F比率。 m2 also gives the same value as m1, but also the infamous “Warning: Error() model is singular”. m2也给出与m1相同的值,但臭名昭着的“警告:错误()模型是单数”。 m3 is doing something strange. m3正在做一些奇怪的事情。 It is further partitioning Within residuals in m1 (634.9) into residuals for three error terms: s_f:a_f (174.2), s_f:b_f (173.6), and s_f:a_f:b_f (287.1). 它进一步将m1(634.9)中的残差划分为三个误差项的残差:s_f:a_f(174.2),s_f:b_f(173.6)和s_f:a_f:b_f(287.1)。 This is wrong, since we would not get three error terms when we run a 2-way between-subjects ANOVA! 这是错误的,因为当我们运行双向的主体间方差分析时,我们不会得到三个错误项! Multiple error terms are also contrary to the point of using block factorial designs, which allows us to use the same error term for the test of A, B, and AB, unlike split-plot designs which requires 2 error terms. 多个误差项也与使用块因子设计相反,这使得我们可以使用相同的误差项来测试A,B和AB,这与需要2个误差项的分裂图设计不同。

Using lmer lmer

How can we get the same dfs and F-values using lmer? 我们如何使用lmer获得相同的dfs和F值? If your data is balanced, the Kenward-Roger approximation used in the lmerTest will give you exact dfs and F-ratio. 如果你的数据是平衡的,在使用的Kenward罗杰近似lmerTest会给你确切的DFS和F-比例。

lmer(y ~ a_f*b_f + (1|s_f), data=d) %>% anova()         # mem1

lmer(y ~ a_f*b_f + (1|s_f/a_f:b_f), data=d) %>% anova() # mem2

lmer(y ~ a_f*b_f + (1|s_f/a_f*b_f), data=d) %>% anova() # mem3

lmer(y ~ a_f*b_f + (1|s_f:a_f:b_f), data=d) %>% anova() # mem4

lmer(y ~ a_f*b_f + (a_f*b_f|s_f), data=d) %>% anova()   # mem5

Again simply specifying the random effect as (1|s_f) give you the correct dfs and F-ratios (mem1). 再简单地将随机效果指定为(1|s_f)得到正确的dfs和F比率(mem1)。 mem2-5 did not even give results, presumably the numbers of random effects it needed to estimate were greater than the sample size. mem2-5甚至没有给出结果,可能是它需要估计的随机效应数量大于样本量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM