繁体   English   中英

使用 group_by 进行更广泛的总结

[英]More extensive summary with group_by

我有一个数据集,其中包含具有疫苗接种状态的 COVID-19 患者以及他们是死是活。

ID <- c(1:20)
Group <- c("1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc",
           "1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc",
           "1. vacc + unvacc", "2. vacc")
Status <- c("Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", 
            "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive")

df <- data.frame(ID, Group, Status)

到目前为止,我已经尝试编写代码,并且可以做到这一点:

library(tidyverse)

df_organ %>% 
  mutate_at("Group", as.character) %>%
  list(group_by(.,Group, Status), .) %>%
  map(~summarize(.,cnt = n())) %>%
  bind_rows() %>%
  replace_na(list(Group="Overall"))

给我 output:

    `summarise()` has grouped output by 'Group'. You can override using the `.groups` argument.
# A tibble: 7 x 3
# Groups:   Group [4]
  Group            Status   cnt
  <chr>            <chr>  <int>
1 1. vacc + unvacc Alive      3
2 1. vacc + unvacc Dead       4
3 2. vacc          Alive      4
4 2. vacc          Dead       3
5 3. vacc          Alive      3
6 3. vacc          Dead       3
7 Overall          NA        20

我正在寻找的 output 是这样的:

    `summarise()` has grouped output by 'Group'. You can override using the `.groups` argument.
    # A tibble: 10 x 3
    # Groups:   Group [4]
      Group            Status   cnt
      <chr>            <chr>  <int>
    1 1. vacc + unvacc Alive      3
    2 1. vacc + unvacc Dead       4
    3 1. uvac + unvacc All        7
    4 2. vacc          Alive      4
    5 2. vacc          Dead       3
    6 2. vacc          All        7
    5 3. vacc          Alive      3
    6 3. vacc          Dead       3
    7 3. vacc          All        6
    8 Overall          Alive     10
    9 Overall          Dead      10
   10 Overall          All       20 

我们可以这样做:

  1. 首先我们计算。 我们使用来自dplyrcount function 。 count的好处是它继承了group_bysummarise
  2. 然后我们使用来自tidyr package 的pivot_wider制作宽格式
  3. 接下来我们使用方便的janitor package 来获取 rowsums 和 colsums。 (我们也可以用 base 来做到这一点......)
  4. 然后通过重命名列返回长格式
library(dplyr)
library(tidyr)
library(janitor)

df %>% 
  count(Group, Status) %>% 
  pivot_wider(
    names_from = Status,
    values_from = n
  ) %>% 
  adorn_totals("col", name = "All") %>% 
  adorn_totals("row", name = "Ovreall") %>% 
  pivot_longer(
    cols= -Group,
    names_to = "Status", 
    values_to = "cnt"
  )
   Group            Status   cnt
   <chr>            <chr>  <dbl>
 1 1. vacc + unvacc Alive      3
 2 1. vacc + unvacc Dead       4
 3 1. vacc + unvacc All        7
 4 2. vacc          Alive      4
 5 2. vacc          Dead       3
 6 2. vacc          All        7
 7 3. vacc          Alive      3
 8 3. vacc          Dead       3
 9 3. vacc          All        6
10 Ovreall          Alive     10
11 Ovreall          Dead      10
12 Ovreall          All       20

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM