简体   繁体   中英

Repeated measures anova without homogeneous variance in R?

I have a dataset of animal species diversity observed in 3 transects each month over (a little more than) 2 years. My question is to find out whether the transects have significantly different animal diversity from each other. For such a simple question a one way ANOVA is almost the answer, however, I think a repeated measures ANOVA to incorporating changes monthly diversity of animals is probably necessary in order to control for the pretty big seasonal fluctuations.

My dataset is below, and also with a plot of what the faunal diversity over time looks like..

  transect<-c(rep("transA",26),rep("transB",25),rep("transC",25))
  months<-as.numeric(c(1:26,1:11,13:26,0,2,4:26))
  animal_species<-c(2,2,2,4,5,1,5,6,14,8,7,5,5,3,1,2,5,9,8,9,10,10,9,9,7,3,1,3,2,2,3,3,3,7,5,6,5,4,2,2,4,4,5,7,4,5,2,4,2,4,1,1,1,1,3,2,2,3,2,2,1,3,5,3,2,4,2,4,3,6,3,2,2,1,2,1)
  animal_df<-data.frame(transect,months,animal_species)

library(ggplot2)
  ggplot(animal_df,aes(months,animal_species))+geom_bar(stat='identity')+theme_bw()+facet_grid(transect~.)

BUT there are two problems which additionally violate the assumptions of ANOVA!

The first is my data have wide variance in numbers of species between transects, and according to a Levene's (median) test, the variances not the same.

animal_AOV<-aov(animal_species~transect, data=animal_df)
 leveneTest(animal_AOV)

# Levene's Test for Homogeneity of Variance (center = median)
#        Df F value    Pr(>F)    
# group  2  10.783 7.889e-05 ***
#      73  

The second is that the data seems to follow different distributions, as is probably most easily seen from the histograms of diversity per transect, where TransA seems to have less skew than the other two.

par(mfrow=c(3,1))
  hist(TransA$animal_species,breaks=14,xlim=c(0,14))  
  hist(TransB$animal_species,breaks=10,xlim=c(0,14))  
  hist(TransC$animal_species,breaks=10,xlim=c(0,14))  

My questions to the community are:

  1. am I correct in thinking that the repeated measures approach is the most sensible analysis pathway?

  2. Are the departures from the assumptions of ANOVA enough worry about? Seeing as there are more than 20 observations and the numbers of observations are relatively well ballanced?

  3. How should such an analysis be coded to produce a viable answer (possibly taking into account the violations), mush of the information online on repeated measures anova seems to be a bit conflicting in agreement on how such an analysis should be put together?

I have essentially a simple question, and my hunch is that it should fall out as the three transects being significantly different from one another (at least trackA having higher diversity than the other two). Does anyone have any suggestions for how to tackle this?

The skewness can be explained by the fact that you are using count data. Count data follows most of the time a poisson distribution, not a normal distribution. So ideally you would use some sort of poisson regression combined with a random effects for the repeated measures.

For more extensive information I would advise you speak to a statistician or google 'Mixed-effects Poisson Regression Model'

Two general issues:

  • @Koot6133 is correct that you should be thinking about a model for count data, which typically operates on a log scale (thus reducing skew and differences in variance)
  • you need to be thinking about the conditional distribution of your data (ie the distribution once the effects of date, etc. are factored out), not the marginal distribution - this means that for the most part you don't worry about what the distribution looks like until after you have fitted the model

Personal preference for line plots - then you can overlay the data and compare them more effectively:

ggplot(animal_df,aes(months,animal_species,colour=transect))+
    geom_line()+theme_bw()+scale_y_log10()
ggsave("animal1.png")

在此处输入图片说明

The zero count data have disappeared since we plotted on a log scale, but this does make it clearer that the transects don't differ much in variance on this scale.

Use the lme4 package to fit a repeated measures/longitudinal Poisson GLMM:

library(lme4)
m1 <- glmer(animal_species~transect+(1|months),
            family=poisson,data=animal_df)

Check for overdispersion (<1, so no problem)

deviance(m1)/df.residual(m1) ## 0.65

Results:

# Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
#   glmerMod]
# Family: poisson  ( log )
# Formula: animal_species ~ transect + (1 | months)
# Data: animal_df
# AIC       BIC    logLik  deviance  df.resid 
# 319.3219  328.6449 -155.6610  311.3219        72 
# Random effects:
#   Groups Name        Std.Dev.
# months (Intercept) 0.3003  
# Number of obs: 76, groups:  months, 27
# Fixed Effects:
#   (Intercept)  transecttransB  transecttransC  
# 1.7110         -0.4792         -0.8847  

Check the location-scale plot:

png("animal2.png")
plot(m1,sqrt(abs(resid(.)))~fitted(.),
     type=c("p","smooth"),col=animal_df$transect)
dev.off()

在此处输入图片说明

No apparent change in variance across groups/number of counts ...

Overlay the results on the data (original scale this time):

pp <- animal_df
pp$animal_species <- predict(m1,type="response")
ggplot(animal_df,aes(months,animal_species,colour=transect))+
  geom_point()+
  geom_line(data=pp)+theme_bw()
ggsave("animal3.png")

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM