简体   繁体   English

使用 group_by 进行更广泛的总结

[英]More extensive summary with group_by

I have a dataset containing COVID-19 patients with vaccination status and whether they're dead or alive.我有一个数据集,其中包含具有疫苗接种状态的 COVID-19 患者以及他们是死是活。

ID <- c(1:20)
Group <- c("1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc",
           "1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc", "1. vacc + unvacc", "2. vacc", "3. vacc",
           "1. vacc + unvacc", "2. vacc")
Status <- c("Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", 
            "Dead", "Alive", "Dead", "Alive", "Dead", "Alive", "Dead", "Alive")

df <- data.frame(ID, Group, Status)

So far, I've tried to make a code, and I can come as far as this:到目前为止,我已经尝试编写代码,并且可以做到这一点:

library(tidyverse)

df_organ %>% 
  mutate_at("Group", as.character) %>%
  list(group_by(.,Group, Status), .) %>%
  map(~summarize(.,cnt = n())) %>%
  bind_rows() %>%
  replace_na(list(Group="Overall"))

Giving me the output:给我 output:

    `summarise()` has grouped output by 'Group'. You can override using the `.groups` argument.
# A tibble: 7 x 3
# Groups:   Group [4]
  Group            Status   cnt
  <chr>            <chr>  <int>
1 1. vacc + unvacc Alive      3
2 1. vacc + unvacc Dead       4
3 2. vacc          Alive      4
4 2. vacc          Dead       3
5 3. vacc          Alive      3
6 3. vacc          Dead       3
7 Overall          NA        20

The output I'm looking for is this:我正在寻找的 output 是这样的:

    `summarise()` has grouped output by 'Group'. You can override using the `.groups` argument.
    # A tibble: 10 x 3
    # Groups:   Group [4]
      Group            Status   cnt
      <chr>            <chr>  <int>
    1 1. vacc + unvacc Alive      3
    2 1. vacc + unvacc Dead       4
    3 1. uvac + unvacc All        7
    4 2. vacc          Alive      4
    5 2. vacc          Dead       3
    6 2. vacc          All        7
    5 3. vacc          Alive      3
    6 3. vacc          Dead       3
    7 3. vacc          All        6
    8 Overall          Alive     10
    9 Overall          Dead      10
   10 Overall          All       20 

We could do it this way:我们可以这样做:

  1. First we count.首先我们计算。 We use count function from dplyr .我们使用来自dplyrcount function 。 The good thing about count is that it inherits group_by and summarise . count的好处是它继承了group_bysummarise
  2. Then we make wide format with pivot_wider from tidyr package然后我们使用来自tidyr package 的pivot_wider制作宽格式
  3. Next we use handy janitor package to get rowsums and colsums.接下来我们使用方便的janitor package 来获取 rowsums 和 colsums。 (We could do this also with base...) (我们也可以用 base 来做到这一点......)
  4. Then get back to long format with renaming the columns然后通过重命名列返回长格式
library(dplyr)
library(tidyr)
library(janitor)

df %>% 
  count(Group, Status) %>% 
  pivot_wider(
    names_from = Status,
    values_from = n
  ) %>% 
  adorn_totals("col", name = "All") %>% 
  adorn_totals("row", name = "Ovreall") %>% 
  pivot_longer(
    cols= -Group,
    names_to = "Status", 
    values_to = "cnt"
  )
   Group            Status   cnt
   <chr>            <chr>  <dbl>
 1 1. vacc + unvacc Alive      3
 2 1. vacc + unvacc Dead       4
 3 1. vacc + unvacc All        7
 4 2. vacc          Alive      4
 5 2. vacc          Dead       3
 6 2. vacc          All        7
 7 3. vacc          Alive      3
 8 3. vacc          Dead       3
 9 3. vacc          All        6
10 Ovreall          Alive     10
11 Ovreall          Dead      10
12 Ovreall          All       20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM