简体   繁体   English

按因子级别的多个变量的特定摘要统计

[英]Specific Summary Statistics for Multiple Variables by Factor Level

I am trying to get the mean, sd, min, max, and range for the mpg, price, weight, and repair record grouped by two factor levels (domestic and foreign) within a variable called foreign. 我试图获取mpg,价格,重量和维修记录的平均值,sd,最小值,最大值和范围,该值由两个变量级别(国内和国外)分组,称为“外部”变量。 I've come across many examples that show how to get one statistic like mean on multiple variables or how to get multiple statistics for one variable grouped by two factor levels. 我遇到了许多示例,这些示例说明了如何获取一个统计数据,例如多个变量的均值,或者如何获取按两个因子水平分组的一个变量的多个统计数据。 However, I haven't found anything particularly useful for developing the table that I've descibed above. 但是,对于开发上面已经介绍过的表,我还没有发现任何特别有用的东西。

I've tried many things and it appears that ddply might be what I should be using. 我尝试了很多事情,看来ddply可能就是我应该使用的东西。 I think it should be something like ddply(df,[column I want to use as factor level], mean=mean(value),... but am unsure of the syntax. Thanks for any help! 我认为应该是ddply(df,[column I want to use as factor level], mean=mean(value),...但是不确定语法。感谢您的帮助!

I would favour a tidyverse approach, such as: 我希望使用tidyverse方法,例如:

library(tibble)
library(dplyr)

mtcars %>%
  rownames_to_column() %>%
  as_tibble() %>%
  group_by(rowname) %>%
  summarise_all(
    funs(mean = mean, median = median, min = min, max = max, sd = sd)
  )

# # A tibble: 32 x 56
#              rowname mpg_mean cyl_mean disp_mean hp_mean drat_mean wt_mean qsec_mean
#                <chr>    <dbl>    <dbl>     <dbl>   <dbl>     <dbl>   <dbl>     <dbl>
# 1        AMC Javelin     15.2        8     304.0     150      3.15   3.435     17.30
# 2 Cadillac Fleetwood     10.4        8     472.0     205      2.93   5.250     17.98
# 3         Camaro Z28     13.3        8     350.0     245      3.73   3.840     15.41
# 4  Chrysler Imperial     14.7        8     440.0     230      3.23   5.345     17.42
# 5         Datsun 710     22.8        4     108.0      93      3.85   2.320     18.61
# 6   Dodge Challenger     15.5        8     318.0     150      2.76   3.520     16.87
# 7         Duster 360     14.3        8     360.0     245      3.21   3.570     15.84
# 8       Ferrari Dino     19.7        6     145.0     175      3.62   2.770     15.50
# 9           Fiat 128     32.4        4      78.7      66      4.08   2.200     19.47
# 10         Fiat X1-9     27.3        4      79.0      66      4.08   1.935     18.90

...or using summarise_if with the is.numeric predicate ...或使用summarise_ifis.numeric谓词

library(dplyr)

starwars %>%
  group_by(homeworld) %>%
  summarise_if(
    is.numeric,
    funs(mean = mean, median = median, min = min, max = max, sd = sd)
  )

# # A tibble: 49 x 16
#        homeworld height_mean mass_mean birth_year_mean height_median mass_median birth_year_median height_min
#            <chr>       <dbl>     <dbl>           <dbl>         <dbl>       <dbl>             <dbl>      <dbl>
# 1       Alderaan    176.3333        NA              NA           188          NA                NA        150
# 2    Aleen Minor     79.0000      15.0              NA            79        15.0                NA         79
# 3         Bespin    175.0000      79.0              37           175        79.0                37        175
# 4     Bestine IV    180.0000     110.0              NA           180       110.0                NA        180
# 5 Cato Neimoidia    191.0000      90.0              NA           191        90.0                NA        191
# 6          Cerea    198.0000      82.0              92           198        82.0                92        198
# 7       Champala    196.0000        NA              NA           196          NA                NA        196
# 8      Chandrila    150.0000        NA              48           150          NA                48        150
# 9   Concord Dawn    183.0000      79.0              66           183        79.0                66        183
# 10      Corellia    175.0000      78.5              25           175        78.5                25        170

...you can always add arguments to the functions if necessary, such as na.rm like this mean(., na.rm = TRUE) ...您总是可以在必要时向函数添加参数,例如na.rm这样的mean(., na.rm = TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过两个或多个因子变量进行汇总统计? - Summary statistics by two or more factor variables? 将各因子水平输出为观星汇总统计表中的虚拟变量 - Output each factor level as dummy variable in stargazer summary statistics table 数值变量和 2 因子变量的汇总统计(SAS 中的这些命令在 R 中是什么?) - Summary statistics of numerical and 2 factor variables (what would these commands in SAS be in R?) 创建具有特定摘要统计信息的变量表 - Creating table of variables with specific summary statistics 多个变量的汇总统计数据,统计数据作为行,变量作为列? - Summary statistics for multiple variables with statistics as rows and variables as columns? 改变多因素变量的水平 - Change level of multiple factor variables 如何打印因子水平汇总统计的最小值和最大值(取中位数/比例的最小值和最大值)? - How to print the minimum and maximum of factor level summary statistics (taking minimum and maximum of medians/proportions)? 如何获取多个组的多个变量的摘要统计信息? - How to get summary statistics for multiple variables by multiple groups? 特定格式数据框中数值变量的汇总统计 - Summary statistics of numeric variables in data frame in specific format 如何删除r中所有因子变量中的一个特定因子水平? - How to remove one specific factor level in all factor variables in r?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM