[英]Obtain basic statistics (min, mean, max, sd) using dplyr?
I have a basic dataframe:我有一个基本的数据框:
a = c(1,4,3,5)
b = c(3,6,3,11)
mydata = data.frame(a,b)
I would like to obtain the same dataframe (two columns a and b), but the basic statistics as lines.我想获得相同的数据框(两列 a 和 b),但基本统计数据为行。
Is there a dplyr command for this?有 dplyr 命令吗?
It may be better to have the data in 'long' format and then do the summary以“长”格式获取数据然后进行汇总可能会更好
library(dplyr)
library(tidyr)
mydata %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise_at(vars(value), list(Min = min, Mean = mean, Max = max, Sd = sd))
# A tibble: 2 x 5
# name Min Mean Max Sd
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 a 1 3.25 5 1.71
#2 b 3 5.75 11 3.77
We can use sapply
:我们可以使用
sapply
:
sapply(mydata, summary)
#> a b
#> Min. 1.00 3.00
#> 1st Qu. 2.50 3.00
#> Median 3.50 4.50
#> Mean 3.25 5.75
#> 3rd Qu. 4.25 7.25
#> Max. 5.00 11.00
or if you don't want the quartiles:或者如果你不想要四分位数:
sapply(mydata, function(x) list(Min = min(x), Mean = mean(x),
Max = max(x), Sd = sd(x)))
A tidyverse
solution would be possible using purrr::map
:使用
purrr::map
可以实现tidyverse
解决方案:
library(purrr)
mydata %>%
map(~summary(.)) %>%
rbind.data.frame
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.