按因子级别的多个变量的特定摘要统计

Question

I am trying to get the mean, sd, min, max, and range for the mpg, price, weight, and repair record grouped by two factor levels (domestic and foreign) within a variable called foreign. 我试图获取mpg，价格，重量和维修记录的平均值，sd，最小值，最大值和范围，该值由两个变量级别（国内和国外）分组，称为“外部”变量。 I've come across many examples that show how to get one statistic like mean on multiple variables or how to get multiple statistics for one variable grouped by two factor levels. 我遇到了许多示例，这些示例说明了如何获取一个统计数据，例如多个变量的均值，或者如何获取按两个因子水平分组的一个变量的多个统计数据。 However, I haven't found anything particularly useful for developing the table that I've descibed above. 但是，对于开发上面已经介绍过的表，我还没有发现任何特别有用的东西。

I've tried many things and it appears that ddply might be what I should be using. 我尝试了很多事情，看来ddply可能就是我应该使用的东西。 I think it should be something like ddply(df,[column I want to use as factor level], mean=mean(value),... but am unsure of the syntax. Thanks for any help! 我认为应该是ddply(df,[column I want to use as factor level], mean=mean(value),...但是不确定语法。感谢您的帮助！

Answer 1

I would favour a tidyverse approach, such as: 我希望使用tidyverse方法，例如：

library(tibble)
library(dplyr)

mtcars %>%
  rownames_to_column() %>%
  as_tibble() %>%
  group_by(rowname) %>%
  summarise_all(
    funs(mean = mean, median = median, min = min, max = max, sd = sd)
  )

# # A tibble: 32 x 56
#              rowname mpg_mean cyl_mean disp_mean hp_mean drat_mean wt_mean qsec_mean
#                <chr>    <dbl>    <dbl>     <dbl>   <dbl>     <dbl>   <dbl>     <dbl>
# 1        AMC Javelin     15.2        8     304.0     150      3.15   3.435     17.30
# 2 Cadillac Fleetwood     10.4        8     472.0     205      2.93   5.250     17.98
# 3         Camaro Z28     13.3        8     350.0     245      3.73   3.840     15.41
# 4  Chrysler Imperial     14.7        8     440.0     230      3.23   5.345     17.42
# 5         Datsun 710     22.8        4     108.0      93      3.85   2.320     18.61
# 6   Dodge Challenger     15.5        8     318.0     150      2.76   3.520     16.87
# 7         Duster 360     14.3        8     360.0     245      3.21   3.570     15.84
# 8       Ferrari Dino     19.7        6     145.0     175      3.62   2.770     15.50
# 9           Fiat 128     32.4        4      78.7      66      4.08   2.200     19.47
# 10         Fiat X1-9     27.3        4      79.0      66      4.08   1.935     18.90

...or using summarise_if with the is.numeric predicate ...或使用summarise_if与is.numeric谓词

library(dplyr)

starwars %>%
  group_by(homeworld) %>%
  summarise_if(
    is.numeric,
    funs(mean = mean, median = median, min = min, max = max, sd = sd)
  )

# # A tibble: 49 x 16
#        homeworld height_mean mass_mean birth_year_mean height_median mass_median birth_year_median height_min
#            <chr>       <dbl>     <dbl>           <dbl>         <dbl>       <dbl>             <dbl>      <dbl>
# 1       Alderaan    176.3333        NA              NA           188          NA                NA        150
# 2    Aleen Minor     79.0000      15.0              NA            79        15.0                NA         79
# 3         Bespin    175.0000      79.0              37           175        79.0                37        175
# 4     Bestine IV    180.0000     110.0              NA           180       110.0                NA        180
# 5 Cato Neimoidia    191.0000      90.0              NA           191        90.0                NA        191
# 6          Cerea    198.0000      82.0              92           198        82.0                92        198
# 7       Champala    196.0000        NA              NA           196          NA                NA        196
# 8      Chandrila    150.0000        NA              48           150          NA                48        150
# 9   Concord Dawn    183.0000      79.0              66           183        79.0                66        183
# 10      Corellia    175.0000      78.5              25           175        78.5                25        170

...you can always add arguments to the functions if necessary, such as na.rm like this mean(., na.rm = TRUE) ...您总是可以在必要时向函数添加参数，例如na.rm这样的mean(., na.rm = TRUE)

按因子级别的多个变量的特定摘要统计

问题描述

1 个解决方案

解决方案1
1 2017-10-03 03:01:14

按因子级别的多个变量的特定摘要统计

问题描述

1 个解决方案

解决方案1 1 2017-10-03 03:01:14

解决方案1
1 2017-10-03 03:01:14