[英]Specific Summary Statistics for Multiple Variables by Factor Level
I am trying to get the mean, sd, min, max, and range for the mpg, price, weight, and repair record grouped by two factor levels (domestic and foreign) within a variable called foreign. 我试图获取mpg,价格,重量和维修记录的平均值,sd,最小值,最大值和范围,该值由两个变量级别(国内和国外)分组,称为“外部”变量。 I've come across many examples that show how to get one statistic like mean on multiple variables or how to get multiple statistics for one variable grouped by two factor levels. 我遇到了许多示例,这些示例说明了如何获取一个统计数据,例如多个变量的均值,或者如何获取按两个因子水平分组的一个变量的多个统计数据。 However, I haven't found anything particularly useful for developing the table that I've descibed above. 但是,对于开发上面已经介绍过的表,我还没有发现任何特别有用的东西。
I've tried many things and it appears that ddply
might be what I should be using. 我尝试了很多事情,看来ddply
可能就是我应该使用的东西。 I think it should be something like ddply(df,[column I want to use as factor level], mean=mean(value),...
but am unsure of the syntax. Thanks for any help! 我认为应该是ddply(df,[column I want to use as factor level], mean=mean(value),...
但是不确定语法。感谢您的帮助!
I would favour a tidyverse
approach, such as: 我希望使用tidyverse
方法,例如:
library(tibble)
library(dplyr)
mtcars %>%
rownames_to_column() %>%
as_tibble() %>%
group_by(rowname) %>%
summarise_all(
funs(mean = mean, median = median, min = min, max = max, sd = sd)
)
# # A tibble: 32 x 56
# rowname mpg_mean cyl_mean disp_mean hp_mean drat_mean wt_mean qsec_mean
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30
# 2 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98
# 3 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41
# 4 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42
# 5 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61
# 6 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87
# 7 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84
# 8 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50
# 9 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47
# 10 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90
...or using summarise_if
with the is.numeric
predicate ...或使用summarise_if
与is.numeric
谓词
library(dplyr)
starwars %>%
group_by(homeworld) %>%
summarise_if(
is.numeric,
funs(mean = mean, median = median, min = min, max = max, sd = sd)
)
# # A tibble: 49 x 16
# homeworld height_mean mass_mean birth_year_mean height_median mass_median birth_year_median height_min
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Alderaan 176.3333 NA NA 188 NA NA 150
# 2 Aleen Minor 79.0000 15.0 NA 79 15.0 NA 79
# 3 Bespin 175.0000 79.0 37 175 79.0 37 175
# 4 Bestine IV 180.0000 110.0 NA 180 110.0 NA 180
# 5 Cato Neimoidia 191.0000 90.0 NA 191 90.0 NA 191
# 6 Cerea 198.0000 82.0 92 198 82.0 92 198
# 7 Champala 196.0000 NA NA 196 NA NA 196
# 8 Chandrila 150.0000 NA 48 150 NA 48 150
# 9 Concord Dawn 183.0000 79.0 66 183 79.0 66 183
# 10 Corellia 175.0000 78.5 25 175 78.5 25 170
...you can always add arguments to the functions if necessary, such as na.rm like this mean(., na.rm = TRUE)
...您总是可以在必要时向函数添加参数,例如na.rm这样的mean(., na.rm = TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.