简体   繁体   中英

Can I calculate the standard error of all columns with the “summarise_all” function in R dplyr

I am preparing course material for the dplyr in R. Assuming that our data set is "iris" someone can calculate the mean and sd of all columns with the summarise all function

iris %>% 
  group_by(Species) %>% 
  summarise_all(funs(mean, sd), na.rm=TRUE)

Howeveer when I am trying to calulate the standard error I am taking an error message.

iris %>% 
  group_by(Species) %>% 
  summarise_all(funs(mean, sd, se = sd/sqrt(n)), na.rm=TRUE)

Any help is highly appreciated

You can use:

library(dplyr)
iris %>% 
  group_by(Species) %>% 
  summarise_all(list(mean = ~mean(.), sd = ~sd(.), se = ~sd(./sqrt(.))))

Or probably shorter but doesn't give you the column names you desire:

iris %>% group_by(Species) %>% summarise_all(list(mean, sd, se = ~sd(./sqrt(.))))

For anyone stumbling across this, I'm fairly sure the other comments are miscalculating the SE as sd/sqrt instead of the sd/(sqrt of the sample size). I don't have the reputation to reply to those, but substituting

se = ~sd(.x)/sqrt(length(.x))

into the above formulas should work.

We can use summarise with across in the new releas

library(dplyr)
iris %>%
     group_by(Species) %>%
    summarise(across(everything(), list(mean = mean, sd = sd, se = ~sd(.)/sqrt(.))))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM