I am preparing course material for the dplyr in R. Assuming that our data set is "iris" someone can calculate the mean and sd of all columns with the summarise all function
iris %>%
group_by(Species) %>%
summarise_all(funs(mean, sd), na.rm=TRUE)
Howeveer when I am trying to calulate the standard error I am taking an error message.
iris %>%
group_by(Species) %>%
summarise_all(funs(mean, sd, se = sd/sqrt(n)), na.rm=TRUE)
Any help is highly appreciated
You can use:
library(dplyr)
iris %>%
group_by(Species) %>%
summarise_all(list(mean = ~mean(.), sd = ~sd(.), se = ~sd(./sqrt(.))))
Or probably shorter but doesn't give you the column names you desire:
iris %>% group_by(Species) %>% summarise_all(list(mean, sd, se = ~sd(./sqrt(.))))
For anyone stumbling across this, I'm fairly sure the other comments are miscalculating the SE as sd/sqrt instead of the sd/(sqrt of the sample size). I don't have the reputation to reply to those, but substituting
se = ~sd(.x)/sqrt(length(.x))
into the above formulas should work.
We can use summarise
with across
in the new releas
library(dplyr)
iris %>%
group_by(Species) %>%
summarise(across(everything(), list(mean = mean, sd = sd, se = ~sd(.)/sqrt(.))))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.