简体   繁体   中英

How do I do statisticly analyze groups with different numbers of individuals and nested treatmens?

I am currently working on my MSC thesis, but I fear I don't have the level of statistical knowledge to analyze this data. In my experiment there are 3 plant species with 31,40 and 82 individuals. They each have their own weight and height, so I need to implement that as well. There are 6 treatments in total, each treatment contains an erosion level and a flow speed (which is how quickly the water will stream in the setup). For each erosion level, I used 2 different flow speeds. So I assume that flow speed is nested/blocked into the erosion. For every treatment the angle of the stem of the plant is measured compared to standing verticaly. Every individual stem went trough each treatment in the exact same order. Is there anyway I can incorporate into a statistical analysis, preferably in R. I tried to make a schematic overview of how my data looks. 在此处输入图像描述

You would be the savior of my MSC thesis:)

Thanks in advance and have a nice day.

I already tried to put the treatments into a vector but since the control groups all have a different length it does not work.

With multiple categorical predictor of erosion , flowspeed , and species , this is similar to what's called "repeated measures analysis of variance." That's a linear model in which you account for the repeated measurements on the same individuals. The problem is that classic repeated measures analysis of variance assumes equal numbers of observations in each treatment/species group, which you don't have.

One way to deal with the different numbers of observations is a linear mixed model. You use erosion , flowspeed , and species as fixed-effect predictors, the angle as the outcome, and treat the individual plants as providing a random effect.

You set up 1 row of data for each observation, with annotations of angle , erosion , flowspeed , species , and an ID indicating the individual plant. It's best with only 6 combinations of erosion with flowspeed to code them as categorical predictors, not as numeric. Include height and weight also on each line if you want to include those variables in the analysis. Use a set of ID values from 1 to 153 instead of re-numbering from 1 within each species. Otherwise the software will think that the plants with ID=1 are all the same individual, and a member of all 3 species!

With the lme4 package in R, you could start with something like:

myModel <- lmer(angle ~ erosion*flowspeed*species + (1|ID), data = myData)

That allows for different associations with angle depending on the combinations of erosion and flowspeed and species . It takes the repeated measurements into account by estimating different intercepts (estimated angle at reference levels of erosion and flowspeed and species ) for the 153 individuals (ID). You don't need to worry about terminology like "nested." The software will correctly interpret the distribution of ID values among the treatment/species combinations.

That will return a large number of fixed-effect regression coefficients: by my quick count, 2 for erosion , 1 for flowspeed , 2 for species , 2 for erosion:flowspeed interactions, 4 for erosion:species interactions, 2 for flowspeed:species interactions, and 4 for erosion:flowspeed:species interactions. Do NOT spend much time trying to figure those coefficients out individually. They describe the model in a way that subsequent analysis with other software will make clearer. You will also get an estimate of the variance among the ID -specific intercept values.

I'd recommend using the Anova() function in the R car package to evaluate the overall associations of each of erosion and flowspeed and species , and their sets of interactions, with the angle outcome. The "Type II" default analysis provided by that function handles different numbers of observations properly, while the standard anova() or aov() functions in R don't.

You then can use post-modeling software like that in the emmeans package to evaluate and compare predicted angle values among combinations of fixed-effect predictors.

You do have to check whether the assumptions of the linear model are reasonably well met. The main issue with categorical predictors is whether the ranges of residuals (differences between observed and predicted angle values) are similar over the range of predicted values. If that's not the case, you might have to consider some pre-transformation of the angle values. A reasonably normal distribution of the residuals is a plus, but not so critical when you have a large number of observations.

The above doesn't incorporate height and weight explicitly in the model. It includes them implicitly in the ID values and the corresponding random intercepts. You could add them as explicit predictors in the model. If you do, think carefully about the form in which to include them. If you just include them as linear terms, you are implicitly assuming that the angle is linearly and additively associated with each of height and weight on top of all of the other effects associated with erosion and flowspeed and species . Is that reasonable?

Finally, there is one limitation to the study design that you need to address in discussing your results. As all plants received the same treatments in the same order, you can't rule out the possibility of some time- or exposure-dependence of the results. That is, the results of later treatment combinations might not just depend on erosion and flowspeed and species but also on the time elapsed or treatments previously experienced.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM