I have this sample data:
Sample Replication Days
1 1 10
1 1 14
1 1 13
1 1 14
2 1 NA
2 1 5
2 1 18
2 1 20
1 2 16
1 2 NA
1 2 18
1 2 21
2 2 15
2 2 7
2 2 12
2 2 14
I have four observations for each sample with a total of 64 samples in each of the two replications. In total, I have 512 values for both the replications. I also have some missing values designated as 'NA'. I prformed ANOVA for Mean values for each Sample for each Rep that I generated using
library(tidyverse)
df <- Data %>% group_by(Sample, Rep) %>% summarise(Mean = mean(Days, na.rm = TRUE))
curve.anova <- aov(Mean~Rep+Sample, data=df)
Result of anova is:
> summary(curve.anova)
Df Sum Sq Mean Sq F value Pr(>F)
Rep 1 6.1 6.071 2.951 0.0915 .
Sample 63 1760.5 27.945 13.585 <2e-16 ***
Residuals 54 111.1 2.057
I created a table for mean and SE values,
ANOVA<-lsmeans(curve.anova, ~Sample)
ANOVA<-summary(ANOVA)
write.csv(ANOVA, file="Desktop/ANOVA.csv")
A few lines from file are:
Sample lsmean SE df lower.CL upper.CL
1 24.875 1.014145417 54 22.84176086 26.90823914
2 25.5 1.014145417 54 23.46676086 27.53323914
3 31.32575758 1.440722628 54 28.43728262 34.21423253
4 26.375 1.014145417 54 24.34176086 28.40823914
5 26.42424242 1.440722628 54 23.53576747 29.31271738
6 25.5 1.014145417 54 23.46676086 27.53323914
7 28.375 1.014145417 54 26.34176086 30.40823914
8 24.875 1.014145417 54 22.84176086 26.90823914
9 21.16666667 1.014145417 54 19.13342752 23.19990581
10 23.875 1.014145417 54 21.84176086 25.90823914
df for all 64 samples is 54 and the error bars in the ggplot are mostly equal for all the Samples. SE values are larger than the manually calculated values. Based on anova results, df=54 is for residuals.
I want to double check the ANOVA results so that they are correct and I am correctly generating lsmeans and SE to plot a bargraph using ggplot with confirdence interval error bars.
I will appreciate any help. Thank you!
After reading your comments, I think your workflow as an issue. Basically, when you are applying your anova
test, you are doing it on means of the different samples. So, in your example, when you are doing :
curve.anova <- aov(Mean~Rep+Sample, data=df)
You are comparing these values:
> df
# A tibble: 4 x 3
# Groups: Sample [2]
Sample Replication Mean
<dbl> <dbl> <dbl>
1 1 1 12.8
2 1 2 18.3
3 2 1 14.3
4 2 2 12
So, basically, you are comparing two groups with two values per group.
So, when you tried to remove the Replication
group, you get an error because the output of:
df = Data %>% group_by(Sample %>% summarise(Mean = mean(Days, na.rm = TRUE))
is now:
# A tibble: 2 x 2
Sample Mean
<dbl> <dbl>
1 1 15.1
2 2 13
So, applying anova
test on that dataset means that you are comparing two groups with one value each. So, you can't compute residuals and SE.
Instead, you should do it on the full dataset without trying to calculate the mean first:
anova_data <- aov(Days~Sample+Replication, data=Data)
anova_data2 <- aov(Days~Sample, data=Data)
And their output are:
> summary(anova_data)
Df Sum Sq Mean Sq F value Pr(>F)
Sample 1 16.07 16.071 0.713 0.416
Replication 1 9.05 9.054 0.402 0.539
Residuals 11 247.80 22.528
2 observations deleted due to missingness
> summary(anova_data2)
Df Sum Sq Mean Sq F value Pr(>F)
Sample 1 16.07 16.07 0.751 0.403
Residuals 12 256.86 21.41
2 observations deleted due to missingness
Now, you can apply lsmeans
:
A_d = summary(lsmeans(anova_data, ~Sample))
A_d2 = summary(lsmeans(anova_data2, ~Sample))
> A_d
Sample lsmean SE df lower.CL upper.CL
1 15.3 1.8 11 11.29 19.2
2 12.9 1.8 11 8.91 16.9
Results are averaged over the levels of: Replication
Confidence level used: 0.95
> A_d2
Sample lsmean SE df lower.CL upper.CL
1 15.1 1.75 12 11.33 19.0
2 13.0 1.75 12 9.19 16.8
Confidence level used: 0.95
It does not change a lot the mean and the SE (which is good because it means that your replicate are consistent and you don't have too much variabilities between those) but it reduces the confidence interval.
So, to plot it, you can:
library(ggplot2)
ggplot(A_d, aes(x=as.factor(Sample), y=lsmean)) +
geom_bar(stat="identity", colour="black") +
geom_errorbar(aes(ymin = lsmean - SE, ymax = lsmean + SE), width = .5)
Based on your initial question, if you want to check that the output of ANOVA is correct, you can mimick fake data like this:
d2 <- data.frame(Sample = c(rep(1,10), rep(2,10)),
Days = c(rnorm(10, mean =3), rnorm(10, mean = 8)))
Then,
curve.d2 <- aov(Days ~ Sample, data = d2)
ANOVA2 <- lsmeans(curve.d2, ~Sample)
ANOVA2 <- summary(ANOVA2)
And you get the following output:
> summary(curve.d2)
Df Sum Sq Mean Sq F value Pr(>F)
Sample 1 139.32 139.32 167.7 1.47e-10 ***
Residuals 18 14.96 0.83
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> ANOVA2
Sample lsmean SE df lower.CL upper.CL
1 2.62 0.288 18 2.02 3.23
2 7.90 0.288 18 7.29 8.51
Confidence level used: 0.95
And for the plot
ggplot(ANOVA2, aes(x=as.factor(Sample), y=lsmean)) +
geom_bar(stat="identity", colour="black") +
geom_errorbar(aes(ymin = lsmean - SE, ymax = lsmean + SE), width = .5)
As you can see, we get lsmeans
for d2
close to 3 and 8 what we set at the first place. So, I think your output are correct. Maybe your data do not present any significant differences and the computation of SE are the same because the distribution of your data are the same. It is what it is.
I hope this answer helps you.
Data
df = data.frame(Sample = c(rep(1,4), rep(2,4),rep(1,4), rep(2,4)),
Replication = c(rep(1,8), rep(2,8)),
Days = c(10,14,13,14,NA,5,18,20,16,NA,18,21,15,7,12,14))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.