[英]Summarising different variables to one column in R with dplyr
我在 R 中有一個包含 110 名患者的數據集,對於每位患者,我都計算了該患者是否有資格根據Criteria1
、 Criteria2
和Criteria3
進行某項測試。 rest 的結果寫在TestResult
中。 dataframe如下:
tibble::tribble(
~Patient.ID, ~Criteria1, ~Criteria2, ~Criteria3, ~TestResult,
1L, "Include", "Include", "Exclude", "Positive",
2L, "Include", "Exclude", "Exclude", "Negative",
3L, "Include", "Exclude", "Exclude", "Positive",
4L, "Include", "Exclude", "Exclude", "Negative",
5L, "Include", "Include", "Exclude", "Negative",
6L, "Include", "Exclude", "Exclude", "Positive",
7L, "Include", "Exclude", "Exclude", "Negative",
8L, "Include", "Exclude", "Exclude", "Negative",
9L, "Include", "Include", "Exclude", "Positive",
10L, "Include", "Include", "Exclude", "Positive"
)
我現在想總結一下如果應用每個標准,我會有多少Positive
和Negative
,最好使用dplyr
。 我的目標是制作一個ggplot2
條形圖,在 x 軸上帶有Criteria
(每個Criteria
有兩個條: Positive
和Negative
顏色作為fill
美學)。 理想情況下,它應該看起來像這樣:
Criteria Criteria_Outcome TestResult Count
Criteria1 Include Positive 30
Criteria1 Include Negative 80
Criteria2 Include Positive 18
Criteria2 Include Negative 46
Criteria3 Include Positive 4
Criteria3 Include Negative 8
Criteria1 Exclude Positive 0
Criteria1 Exclude Negative 0
Criteria2 Exclude Positive 12
Criteria2 Exclude Negative 34
Criteria3 Exclude Positive 26
Criteria3 Exclude Negative 72
問題是,如何在不連接三個表的情況下實現這一點? 這種方法感覺是手動的,我必須在連接之前向每個表添加一個Criteria_Name
列,而且每個表的第一列都有不同的變量名稱(標准名稱)。
table1 <- df %>% count(Criteria1,TestResult)
table2 <- df %>% count(Criteria2,TestResult)
table3 <- df %>% count(Criteria3,TestResult)
我考慮過使用summarise
或count
並獲得了這個結果,但這並不是我想要的:
summary_table <- df %>% count(Criteria1,Criteria2,Criteria3,TestResult)
Criteria1 Criteria2 Criteria3 TestResult n
1 Include Exclude Exclude Negative 31
2 Include Exclude Exclude Positive 10
3 Include Exclude Include Negative 3
4 Include Exclude Include Positive 2
5 Include Include Exclude Negative 41
6 Include Include Exclude Positive 16
7 Include Include Include Negative 5
8 Include Include Include Positive 2
預先感謝您的幫助!
您可以將 pivot 多個列合並為一個鍵和值列。 這允許您計算行組:
library(tidyverse)
data <- tibble::tribble(
~Patient.ID, ~Criteria1, ~Criteria2, ~Criteria3, ~TestResult,
1L, "Include", "Include", "Exclude", "Positive",
2L, "Include", "Exclude", "Exclude", "Negative",
3L, "Include", "Exclude", "Exclude", "Positive",
4L, "Include", "Exclude", "Exclude", "Negative",
5L, "Include", "Include", "Exclude", "Negative",
6L, "Include", "Exclude", "Exclude", "Positive",
7L, "Include", "Exclude", "Exclude", "Negative",
8L, "Include", "Exclude", "Exclude", "Negative",
9L, "Include", "Include", "Exclude", "Positive",
10L, "Include", "Include", "Exclude", "Positive"
)
counts <-
data %>%
pivot_longer(starts_with("Criteria")) %>%
count(name, value, TestResult, name = "Count")
counts
#> # A tibble: 8 × 4
#> name value TestResult Count
#> <chr> <chr> <chr> <int>
#> 1 Criteria1 Include Negative 5
#> 2 Criteria1 Include Positive 5
#> 3 Criteria2 Exclude Negative 4
#> 4 Criteria2 Exclude Positive 2
#> 5 Criteria2 Include Negative 1
#> 6 Criteria2 Include Positive 3
#> 7 Criteria3 Exclude Negative 5
#> 8 Criteria3 Exclude Positive 5
counts %>%
ggplot(aes(name, Count, fill = TestResult)) +
geom_col()
由reprex package (v2.0.1) 創建於 2022-04-15
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.