I have a dataset of 110 patients in R and for each patient, I have calculated whether the patient would qualify for testing with a certain test according to Criteria1
, Criteria2
and Criteria3
. The outcome of the rest is written in TestResult
. The dataframe is as follows:
tibble::tribble(
~Patient.ID, ~Criteria1, ~Criteria2, ~Criteria3, ~TestResult,
1L, "Include", "Include", "Exclude", "Positive",
2L, "Include", "Exclude", "Exclude", "Negative",
3L, "Include", "Exclude", "Exclude", "Positive",
4L, "Include", "Exclude", "Exclude", "Negative",
5L, "Include", "Include", "Exclude", "Negative",
6L, "Include", "Exclude", "Exclude", "Positive",
7L, "Include", "Exclude", "Exclude", "Negative",
8L, "Include", "Exclude", "Exclude", "Negative",
9L, "Include", "Include", "Exclude", "Positive",
10L, "Include", "Include", "Exclude", "Positive"
)
I would now like to summarise how many Positive
and Negative
I would have if each criteria was applied, preferably using dplyr
. My goal is to make a ggplot2
bar-plot with Criteria
on the x-axis (with two bars per Criteria
: Positive
and Negative
coloured as fill
aesthetic). Ideally, it should look something like this:
Criteria Criteria_Outcome TestResult Count
Criteria1 Include Positive 30
Criteria1 Include Negative 80
Criteria2 Include Positive 18
Criteria2 Include Negative 46
Criteria3 Include Positive 4
Criteria3 Include Negative 8
Criteria1 Exclude Positive 0
Criteria1 Exclude Negative 0
Criteria2 Exclude Positive 12
Criteria2 Exclude Negative 34
Criteria3 Exclude Positive 26
Criteria3 Exclude Negative 72
The question is, how do I achieve this without concatenating three tables? The approach feels manual and I'd have to add a Criteria_Name
column to each table before concatenating, plus the first column of each table has a different variable name (the name of the criteria).
table1 <- df %>% count(Criteria1,TestResult)
table2 <- df %>% count(Criteria2,TestResult)
table3 <- df %>% count(Criteria3,TestResult)
I considered using summarise
or count
and obtained this result, but it is not quite what I am looking for:
summary_table <- df %>% count(Criteria1,Criteria2,Criteria3,TestResult)
Criteria1 Criteria2 Criteria3 TestResult n
1 Include Exclude Exclude Negative 31
2 Include Exclude Exclude Positive 10
3 Include Exclude Include Negative 3
4 Include Exclude Include Positive 2
5 Include Include Exclude Negative 41
6 Include Include Exclude Positive 16
7 Include Include Include Negative 5
8 Include Include Include Positive 2
Thank you in advance for your help!
You can pivot multiple columns into a key and value columns. This allows you to count the row groups:
library(tidyverse)
data <- tibble::tribble(
~Patient.ID, ~Criteria1, ~Criteria2, ~Criteria3, ~TestResult,
1L, "Include", "Include", "Exclude", "Positive",
2L, "Include", "Exclude", "Exclude", "Negative",
3L, "Include", "Exclude", "Exclude", "Positive",
4L, "Include", "Exclude", "Exclude", "Negative",
5L, "Include", "Include", "Exclude", "Negative",
6L, "Include", "Exclude", "Exclude", "Positive",
7L, "Include", "Exclude", "Exclude", "Negative",
8L, "Include", "Exclude", "Exclude", "Negative",
9L, "Include", "Include", "Exclude", "Positive",
10L, "Include", "Include", "Exclude", "Positive"
)
counts <-
data %>%
pivot_longer(starts_with("Criteria")) %>%
count(name, value, TestResult, name = "Count")
counts
#> # A tibble: 8 × 4
#> name value TestResult Count
#> <chr> <chr> <chr> <int>
#> 1 Criteria1 Include Negative 5
#> 2 Criteria1 Include Positive 5
#> 3 Criteria2 Exclude Negative 4
#> 4 Criteria2 Exclude Positive 2
#> 5 Criteria2 Include Negative 1
#> 6 Criteria2 Include Positive 3
#> 7 Criteria3 Exclude Negative 5
#> 8 Criteria3 Exclude Positive 5
counts %>%
ggplot(aes(name, Count, fill = TestResult)) +
geom_col()
Created on 2022-04-15 by the reprex package (v2.0.1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.