简体   繁体   English

如何在R中添加总计以及group_by统计信息

[英]How to add totals as well as group_by statistics in R

When computing any statistic using summarise and group_by we only get the summary statistic per-category, and not the value for all the population (Total). 当计算使用任何统计summarisegroup_by我们只能得到每类的汇总统计,而不是所有的人口(总)的值。 How to get both? 如何获得两者?

I am looking for something clean and short. 我正在寻找干净整洁的东西。 Until now I can only think of: 到目前为止,我只能想到:

bind_rows( 
  iris %>% group_by(Species) %>% summarise(
    "Mean" = mean(Sepal.Width), 
    "Median" = median(Sepal.Width), 
    "sd" = sd(Sepal.Width), 
    "p10" = quantile(Sepal.Width, probs = 0.1))
  , 
  iris %>% summarise(
    "Mean" = mean(Sepal.Width), 
    "Median" = median(Sepal.Width), 
    "sd" = sd(Sepal.Width), 
    "p10" = quantile(Sepal.Width, probs = 0.1)) %>% 
  mutate(Species = "Total")
  )

But I would like something more compact. 但是我想要更紧凑的东西。 In particular, I don't want to type the code (for summarize) twice, once for each group and once for the total. 特别是,我不想键入两次代码(用于摘要),每个组一次,总计一次。

You can simplify it if you untangle what you're trying to do: you have iris data that has several species, and you want that summarized along with data for all species. 如果解开要尝试的操作,则可以简化它:您拥有包含多种物种的iris数据,并且希望将其与所有物种的数据一起汇总。 You don't need to calculate those summary stats before you can bind. 绑定之前,无需计算这些摘要统计信息。 Instead, bind iris with a version of iris that's been set to Species = "Total" , then group and summarize. 而是将iris与设置为Species = "Total"iris版本绑定,然后进行分组和汇总。

library(tidyverse)

bind_rows(
  iris,
  iris %>% mutate(Species = "Total")
) %>%
  group_by(Species) %>%
  summarise(Mean = mean(Sepal.Width),
            Median = median(Sepal.Width),
            sd = sd(Sepal.Width),
            p10 = quantile(Sepal.Width, probs = 0.1))
#> # A tibble: 4 x 5
#>   Species     Mean Median    sd   p10
#>   <chr>      <dbl>  <dbl> <dbl> <dbl>
#> 1 setosa      3.43    3.4 0.379  3   
#> 2 Total       3.06    3   0.436  2.5 
#> 3 versicolor  2.77    2.8 0.314  2.3 
#> 4 virginica   2.97    3   0.322  2.59

I like the caution in the comments above, though I have to do this sort of calculation for work enough that I have a similar shorthand function in a personal package. 我喜欢上面的评论中的注意事项,尽管我必须进行足够的这种计算才能在个人软件包中使用类似的速记功能。 It perhaps makes less sense for things like standard deviations, but it's something I need to do a lot for adding up totals of demographic groups, etc. (If it's useful, that function is here ). 对于诸如标准偏差之类的事情来说,这可能没有什么意义,但是我需要做很多事来增加人口统计总数等。(如果有用,该函数在此处 )。

bit shorter, though quite similar to bind_rows 有点短,尽管与bind_rows非常相似

    q10 <- function(x){quantile(x , probs=0.1)}

    iris %>% 
      select(Species,Sepal.Width)%>%
      group_by(Species) %>% 
      summarise_all(c("mean", "sd", "q10")) %>% 
      t() %>% 

      cbind(c("total", iris %>% select(Sepal.Width) %>% summarise_all(c("mean", "sd", "q10")))) %>% 
      t()

more clean probably: 可能更干净:

  bind_rows( 
    iris %>% 
      group_by(Species) %>%  
      select(Sepal.Width)%>%
      summarise_all(c("mean", "sd", "q10"))
    , 
    iris %>% 
      select(Sepal.Width)%>%
      summarise_all(c("mean", "sd", "q10")) %>% 
      mutate(Species = "Total")
  )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM