I have a dataset of animal species diversity observed in 3 transects each month over (a little more than) 2 years. My question is to find out whether the transects have significantly different animal diversity from each other. For such a simple question a one way ANOVA is almost the answer, however, I think a repeated measures ANOVA to incorporating changes monthly diversity of animals is probably necessary in order to control for the pretty big seasonal fluctuations.
My dataset is below, and also with a plot of what the faunal diversity over time looks like..
transect<-c(rep("transA",26),rep("transB",25),rep("transC",25))
months<-as.numeric(c(1:26,1:11,13:26,0,2,4:26))
animal_species<-c(2,2,2,4,5,1,5,6,14,8,7,5,5,3,1,2,5,9,8,9,10,10,9,9,7,3,1,3,2,2,3,3,3,7,5,6,5,4,2,2,4,4,5,7,4,5,2,4,2,4,1,1,1,1,3,2,2,3,2,2,1,3,5,3,2,4,2,4,3,6,3,2,2,1,2,1)
animal_df<-data.frame(transect,months,animal_species)
library(ggplot2)
ggplot(animal_df,aes(months,animal_species))+geom_bar(stat='identity')+theme_bw()+facet_grid(transect~.)
BUT there are two problems which additionally violate the assumptions of ANOVA!
The first is my data have wide variance in numbers of species between transects, and according to a Levene's (median) test, the variances not the same.
animal_AOV<-aov(animal_species~transect, data=animal_df)
leveneTest(animal_AOV)
# Levene's Test for Homogeneity of Variance (center = median)
# Df F value Pr(>F)
# group 2 10.783 7.889e-05 ***
# 73
The second is that the data seems to follow different distributions, as is probably most easily seen from the histograms of diversity per transect, where TransA seems to have less skew than the other two.
par(mfrow=c(3,1))
hist(TransA$animal_species,breaks=14,xlim=c(0,14))
hist(TransB$animal_species,breaks=10,xlim=c(0,14))
hist(TransC$animal_species,breaks=10,xlim=c(0,14))
My questions to the community are:
am I correct in thinking that the repeated measures approach is the most sensible analysis pathway?
Are the departures from the assumptions of ANOVA enough worry about? Seeing as there are more than 20 observations and the numbers of observations are relatively well ballanced?
How should such an analysis be coded to produce a viable answer (possibly taking into account the violations), mush of the information online on repeated measures anova seems to be a bit conflicting in agreement on how such an analysis should be put together?
I have essentially a simple question, and my hunch is that it should fall out as the three transects being significantly different from one another (at least trackA
having higher diversity than the other two). Does anyone have any suggestions for how to tackle this?
The skewness can be explained by the fact that you are using count data. Count data follows most of the time a poisson distribution, not a normal distribution. So ideally you would use some sort of poisson regression combined with a random effects for the repeated measures.
For more extensive information I would advise you speak to a statistician or google 'Mixed-effects Poisson Regression Model'
Two general issues:
Personal preference for line plots - then you can overlay the data and compare them more effectively:
ggplot(animal_df,aes(months,animal_species,colour=transect))+
geom_line()+theme_bw()+scale_y_log10()
ggsave("animal1.png")
The zero count data have disappeared since we plotted on a log scale, but this does make it clearer that the transects don't differ much in variance on this scale.
Use the lme4
package to fit a repeated measures/longitudinal Poisson GLMM:
library(lme4)
m1 <- glmer(animal_species~transect+(1|months),
family=poisson,data=animal_df)
Check for overdispersion (<1, so no problem)
deviance(m1)/df.residual(m1) ## 0.65
Results:
# Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
# glmerMod]
# Family: poisson ( log )
# Formula: animal_species ~ transect + (1 | months)
# Data: animal_df
# AIC BIC logLik deviance df.resid
# 319.3219 328.6449 -155.6610 311.3219 72
# Random effects:
# Groups Name Std.Dev.
# months (Intercept) 0.3003
# Number of obs: 76, groups: months, 27
# Fixed Effects:
# (Intercept) transecttransB transecttransC
# 1.7110 -0.4792 -0.8847
Check the location-scale plot:
png("animal2.png")
plot(m1,sqrt(abs(resid(.)))~fitted(.),
type=c("p","smooth"),col=animal_df$transect)
dev.off()
No apparent change in variance across groups/number of counts ...
Overlay the results on the data (original scale this time):
pp <- animal_df
pp$animal_species <- predict(m1,type="response")
ggplot(animal_df,aes(months,animal_species,colour=transect))+
geom_point()+
geom_line(data=pp)+theme_bw()
ggsave("animal3.png")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.