简体   繁体   English

重复测量方差分析,R?

[英]Repeated measures anova without homogeneous variance in R?

I have a dataset of animal species diversity observed in 3 transects each month over (a little more than) 2 years. 我有一个动物物种多样性的数据集,在2年中(每月略有超过3年)每月观察一次。 My question is to find out whether the transects have significantly different animal diversity from each other. 我的问题是找出这些样带是否具有明显不同的动物多样性。 For such a simple question a one way ANOVA is almost the answer, however, I think a repeated measures ANOVA to incorporating changes monthly diversity of animals is probably necessary in order to control for the pretty big seasonal fluctuations. 对于这样一个简单的问题,ANOVA几乎是解决问题的一种方法,但是,我认为为了控制相当大的季节性波动,可能有必要采取重复措施ANOVA来纳入动物每月变化的变化。

My dataset is below, and also with a plot of what the faunal diversity over time looks like.. 我的数据集在下面,并且还绘制了随时间变化的动物多样性图。

  transect<-c(rep("transA",26),rep("transB",25),rep("transC",25))
  months<-as.numeric(c(1:26,1:11,13:26,0,2,4:26))
  animal_species<-c(2,2,2,4,5,1,5,6,14,8,7,5,5,3,1,2,5,9,8,9,10,10,9,9,7,3,1,3,2,2,3,3,3,7,5,6,5,4,2,2,4,4,5,7,4,5,2,4,2,4,1,1,1,1,3,2,2,3,2,2,1,3,5,3,2,4,2,4,3,6,3,2,2,1,2,1)
  animal_df<-data.frame(transect,months,animal_species)

library(ggplot2)
  ggplot(animal_df,aes(months,animal_species))+geom_bar(stat='identity')+theme_bw()+facet_grid(transect~.)

BUT there are two problems which additionally violate the assumptions of ANOVA! 但是还有两个问题违反了ANOVA的假设!

The first is my data have wide variance in numbers of species between transects, and according to a Levene's (median) test, the variances not the same. 首先是我的数据在样带之间的物种数量差异很大,根据Levene的(中位数)检验,差异并不相同。

animal_AOV<-aov(animal_species~transect, data=animal_df)
 leveneTest(animal_AOV)

# Levene's Test for Homogeneity of Variance (center = median)
#        Df F value    Pr(>F)    
# group  2  10.783 7.889e-05 ***
#      73  

The second is that the data seems to follow different distributions, as is probably most easily seen from the histograms of diversity per transect, where TransA seems to have less skew than the other two. 第二个原因是数据似乎遵循不同的分布,这很容易从每个样例的多样性直方图中看出,TransA的偏度似乎小于其他两个。

par(mfrow=c(3,1))
  hist(TransA$animal_species,breaks=14,xlim=c(0,14))  
  hist(TransB$animal_species,breaks=10,xlim=c(0,14))  
  hist(TransC$animal_species,breaks=10,xlim=c(0,14))  

My questions to the community are: 我对社区的问题是:

  1. am I correct in thinking that the repeated measures approach is the most sensible analysis pathway? 我是否认为重复测量方法是最明智的分析途径是正确的?

  2. Are the departures from the assumptions of ANOVA enough worry about? 是否离不开方差分析的假设? Seeing as there are more than 20 observations and the numbers of observations are relatively well ballanced? 看到有20多个观测值,并且观测值的数量相对平衡?

  3. How should such an analysis be coded to produce a viable answer (possibly taking into account the violations), mush of the information online on repeated measures anova seems to be a bit conflicting in agreement on how such an analysis should be put together? 如何对这种分析进行编码以得出可行的答案(可能考虑到违规情况),在线信息重复测量方差分析的信息似乎与如何组合这种分析达成共识有点矛盾?

I have essentially a simple question, and my hunch is that it should fall out as the three transects being significantly different from one another (at least trackA having higher diversity than the other two). 我本质上有一个简单的问题,我的直觉是,当三个样面彼此显着不同(至少trackA具有比其他两个trackA更高的多样性)时,它应该消失。 Does anyone have any suggestions for how to tackle this? 有人对如何解决这个问题有任何建议吗?

The skewness can be explained by the fact that you are using count data. 偏斜可以通过使用计数数据这一事实来解释。 Count data follows most of the time a poisson distribution, not a normal distribution. 计数数据大部分时间遵循泊松分布而不是正态分布。 So ideally you would use some sort of poisson regression combined with a random effects for the repeated measures. 因此,理想情况下,您应将某种泊松回归与随机效应结合起来用于重复测量。

For more extensive information I would advise you speak to a statistician or google 'Mixed-effects Poisson Regression Model' 有关更广泛的信息,我建议您与统计学家或Google进行“混合效应泊松回归模型”

Two general issues: 两个一般性问题:

  • @Koot6133 is correct that you should be thinking about a model for count data, which typically operates on a log scale (thus reducing skew and differences in variance) @ Koot6133是正确的,您应该考虑一个计数数据模型,该模型通常以对数刻度运行​​(因此减少了偏斜和方差)
  • you need to be thinking about the conditional distribution of your data (ie the distribution once the effects of date, etc. are factored out), not the marginal distribution - this means that for the most part you don't worry about what the distribution looks like until after you have fitted the model 您需要考虑数据的条件分布(即,考虑到日期等因素的影响后的分布),而不是边际分布-这意味着在大多数情况下,您不必担心分布是什么看起来,直到您已经安装在模型

Personal preference for line plots - then you can overlay the data and compare them more effectively: 个人喜好折线图-然后您可以叠加数据并更有效地比较它们:

ggplot(animal_df,aes(months,animal_species,colour=transect))+
    geom_line()+theme_bw()+scale_y_log10()
ggsave("animal1.png")

在此处输入图片说明

The zero count data have disappeared since we plotted on a log scale, but this does make it clearer that the transects don't differ much in variance on this scale. 自从我们在对数刻度上进行绘制以来,零计数数据已经消失了,但这确实使样条在该刻度上的方差差异不大。

Use the lme4 package to fit a repeated measures/longitudinal Poisson GLMM: 使用lme4软件包以适合重复测量/纵向泊松GLMM:

library(lme4)
m1 <- glmer(animal_species~transect+(1|months),
            family=poisson,data=animal_df)

Check for overdispersion (<1, so no problem) 检查是否存在过度分散(<1,所以没有问题)

deviance(m1)/df.residual(m1) ## 0.65

Results: 结果:

# Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
#   glmerMod]
# Family: poisson  ( log )
# Formula: animal_species ~ transect + (1 | months)
# Data: animal_df
# AIC       BIC    logLik  deviance  df.resid 
# 319.3219  328.6449 -155.6610  311.3219        72 
# Random effects:
#   Groups Name        Std.Dev.
# months (Intercept) 0.3003  
# Number of obs: 76, groups:  months, 27
# Fixed Effects:
#   (Intercept)  transecttransB  transecttransC  
# 1.7110         -0.4792         -0.8847  

Check the location-scale plot: 检查位置比例图:

png("animal2.png")
plot(m1,sqrt(abs(resid(.)))~fitted(.),
     type=c("p","smooth"),col=animal_df$transect)
dev.off()

在此处输入图片说明

No apparent change in variance across groups/number of counts ... 各组/计数数量之间的方差无明显变化...

Overlay the results on the data (original scale this time): 将结果覆盖到数据上(这次是原始比例):

pp <- animal_df
pp$animal_species <- predict(m1,type="response")
ggplot(animal_df,aes(months,animal_species,colour=transect))+
  geom_point()+
  geom_line(data=pp)+theme_bw()
ggsave("animal3.png")

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM