I've been taught to run an ANOVA with the formula: aov(dependent variable~independent variable, dataset)
but I am struggling with how to run an ANOVA for a particular dataset because it is broken up into three columns that each contain a value. The three columns are designated newborn, adolescent and adult (which is hamster age) and the values within each column represent blood pressure values. I need to run a test to determine if there is a relationship between blood pressure and age.
This is what the data looks like in R:
> hamster
Newborn adolescent adult
1 108 110 105
2 110 105 100
3 90 100 95
4 80 90 85
5 100 102 97
6 120 110 105
7 125 105 100
8 130 115 110
9 120 100 95
10 130 120 115
11 145 130 125
12 150 125 120
13 130 135 130
14 155 130 125
15 140 120 115
Confused because the dependent variable are those values ^ within each column
The first step is to rearrange your data so it's in a "long" format instead of a "wide" format. This can be done in base R using the reshape
function, but it's much easier to use the gather
function in the tidyr
package:
library(tidyr)
result <- hampster %>%
gather(age, bp) %>%
aov(bp ~ age, .)
Using tidyr
also gives us the pipe operator ( %>%
), which let's you chain commands together in a pretty way. By default, it works by taking the result of the previous function and inserting it as the first argument of the next function. In your aov
function, we overrode this using the .
operator to explicitly put the data set resulting from the gather
function in as the 2nd argument.
R has a useful function called stack
to convert your data format into the one needed for ANOVA.
aov(values ~ ind, stack(hamster))
# Call:
#
# aov(formula = values ~ ind, data = stack(hamster))
#
# Terms:
# ind Residuals
# Sum of Squares 1525.378 11429.867
# Deg. of Freedom 2 42
#
# Residual standard error: 16.49666
# Estimated effects may be unbalanced
Code to run a repeated measures analysis of variance with one within subject variable and no between subjects variables is as follows. Note that we use group_by()
from the dplyr
package to retain the hamster id number so we can use it as the error term in the ANOVA.
hamsterData <- "id Newborn adolescent adult
1 108 110 105
2 110 105 100
3 90 100 95
4 80 90 85
5 100 102 97
6 120 110 105
7 125 105 100
8 130 115 110
9 120 100 95
10 130 120 115
11 145 130 125
12 150 125 120
13 130 135 130
14 155 130 125
15 140 120 115"
hamster <- read.table(text = hamsterData,header = TRUE )
library(tidyr)
library(dplyr)
result <- hamster %>% group_by(id) %>%
gather(age,bp, Newborn,adolescent,adult)
result$age <- factor(result$age,levels=c("Newborn","adolescent","adult"))
options(contrasts=c("contr.sum","contr.poly"))
modelAOV <- aov(bp ~ age + Error(factor(id)),data = result)
summary(modelAOV)
...and the output:
> modelAOV <- aov(bp ~ age + Error(factor(id)),data = result)
> summary(modelAOV)
Error: factor(id)
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 14 10013 715.2
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
age 2 1525 762.7 15.07 3.6e-05 ***
Residuals 28 1417 50.6
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.