简体   繁体   中英

Summarising different variables to one column in R with dplyr

I have a dataset of 110 patients in R and for each patient, I have calculated whether the patient would qualify for testing with a certain test according to Criteria1 , Criteria2 and Criteria3 . The outcome of the rest is written in TestResult . The dataframe is as follows:

tibble::tribble(
     ~Patient.ID,       ~Criteria1, ~Criteria2,      ~Criteria3, ~TestResult,
              1L, "Include",         "Include", "Exclude",         "Positive",
              2L, "Include",         "Exclude", "Exclude",         "Negative",
              3L, "Include",         "Exclude", "Exclude",         "Positive",
              4L, "Include",         "Exclude", "Exclude",         "Negative",
              5L, "Include",         "Include", "Exclude",         "Negative",
              6L, "Include",         "Exclude", "Exclude",         "Positive",
              7L, "Include",         "Exclude", "Exclude",         "Negative",
              8L, "Include",         "Exclude", "Exclude",         "Negative",
              9L, "Include",         "Include", "Exclude",         "Positive",
             10L, "Include",         "Include", "Exclude",         "Positive"
     )

I would now like to summarise how many Positive and Negative I would have if each criteria was applied, preferably using dplyr . My goal is to make a ggplot2 bar-plot with Criteria on the x-axis (with two bars per Criteria : Positive and Negative coloured as fill aesthetic). Ideally, it should look something like this:

Criteria    Criteria_Outcome    TestResult  Count           
Criteria1   Include             Positive    30
Criteria1   Include             Negative    80
Criteria2   Include             Positive    18
Criteria2   Include             Negative    46
Criteria3   Include             Positive    4
Criteria3   Include             Negative    8
Criteria1   Exclude             Positive    0
Criteria1   Exclude             Negative    0
Criteria2   Exclude             Positive    12
Criteria2   Exclude             Negative    34
Criteria3   Exclude             Positive    26
Criteria3   Exclude             Negative    72

The question is, how do I achieve this without concatenating three tables? The approach feels manual and I'd have to add a Criteria_Name column to each table before concatenating, plus the first column of each table has a different variable name (the name of the criteria).

table1 <- df %>% count(Criteria1,TestResult)
table2 <- df %>% count(Criteria2,TestResult)
table3 <- df %>% count(Criteria3,TestResult)

I considered using summarise or count and obtained this result, but it is not quite what I am looking for:

summary_table <- df %>% count(Criteria1,Criteria2,Criteria3,TestResult)

  Criteria1        Criteria2   Criteria3         TestResult    n
1 Include          Exclude      Exclude          Negative      31
2 Include          Exclude      Exclude          Positive      10
3 Include          Exclude      Include          Negative      3
4 Include          Exclude      Include          Positive      2
5 Include          Include      Exclude          Negative      41
6 Include          Include      Exclude          Positive      16
7 Include          Include      Include          Negative      5
8 Include          Include      Include          Positive      2

Thank you in advance for your help!

You can pivot multiple columns into a key and value columns. This allows you to count the row groups:

library(tidyverse)

data <- tibble::tribble(
  ~Patient.ID, ~Criteria1, ~Criteria2, ~Criteria3, ~TestResult,
  1L, "Include", "Include", "Exclude", "Positive",
  2L, "Include", "Exclude", "Exclude", "Negative",
  3L, "Include", "Exclude", "Exclude", "Positive",
  4L, "Include", "Exclude", "Exclude", "Negative",
  5L, "Include", "Include", "Exclude", "Negative",
  6L, "Include", "Exclude", "Exclude", "Positive",
  7L, "Include", "Exclude", "Exclude", "Negative",
  8L, "Include", "Exclude", "Exclude", "Negative",
  9L, "Include", "Include", "Exclude", "Positive",
  10L, "Include", "Include", "Exclude", "Positive"
)

counts <-
  data %>%
  pivot_longer(starts_with("Criteria")) %>%
  count(name, value, TestResult, name = "Count")
counts
#> # A tibble: 8 × 4
#>   name      value   TestResult Count
#>   <chr>     <chr>   <chr>      <int>
#> 1 Criteria1 Include Negative       5
#> 2 Criteria1 Include Positive       5
#> 3 Criteria2 Exclude Negative       4
#> 4 Criteria2 Exclude Positive       2
#> 5 Criteria2 Include Negative       1
#> 6 Criteria2 Include Positive       3
#> 7 Criteria3 Exclude Negative       5
#> 8 Criteria3 Exclude Positive       5

counts %>%
  ggplot(aes(name, Count, fill = TestResult)) +
    geom_col()

Created on 2022-04-15 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM