簡體   English   中英

將不同變量匯總到 R 和 dplyr 中的一列

[英]Summarising different variables to one column in R with dplyr

我在 R 中有一個包含 110 名患者的數據集,對於每位患者,我都計算了該患者是否有資格根據Criteria1Criteria2Criteria3進行某項測試。 rest 的結果寫在TestResult中。 dataframe如下:

tibble::tribble(
     ~Patient.ID,       ~Criteria1, ~Criteria2,      ~Criteria3, ~TestResult,
              1L, "Include",         "Include", "Exclude",         "Positive",
              2L, "Include",         "Exclude", "Exclude",         "Negative",
              3L, "Include",         "Exclude", "Exclude",         "Positive",
              4L, "Include",         "Exclude", "Exclude",         "Negative",
              5L, "Include",         "Include", "Exclude",         "Negative",
              6L, "Include",         "Exclude", "Exclude",         "Positive",
              7L, "Include",         "Exclude", "Exclude",         "Negative",
              8L, "Include",         "Exclude", "Exclude",         "Negative",
              9L, "Include",         "Include", "Exclude",         "Positive",
             10L, "Include",         "Include", "Exclude",         "Positive"
     )

我現在想總結一下如果應用每個標准,我會有多少PositiveNegative ,最好使用dplyr 我的目標是制作一個ggplot2條形圖,在 x 軸上帶有Criteria (每個Criteria有兩個條: PositiveNegative顏色作為fill美學)。 理想情況下,它應該看起來像這樣:

Criteria    Criteria_Outcome    TestResult  Count           
Criteria1   Include             Positive    30
Criteria1   Include             Negative    80
Criteria2   Include             Positive    18
Criteria2   Include             Negative    46
Criteria3   Include             Positive    4
Criteria3   Include             Negative    8
Criteria1   Exclude             Positive    0
Criteria1   Exclude             Negative    0
Criteria2   Exclude             Positive    12
Criteria2   Exclude             Negative    34
Criteria3   Exclude             Positive    26
Criteria3   Exclude             Negative    72

問題是,如何在不連接三個表的情況下實現這一點? 這種方法感覺是手動的,我必須在連接之前向每個表添加一個Criteria_Name列,而且每個表的第一列都有不同的變量名稱(標准名稱)。

table1 <- df %>% count(Criteria1,TestResult)
table2 <- df %>% count(Criteria2,TestResult)
table3 <- df %>% count(Criteria3,TestResult)

我考慮過使用summarisecount並獲得了這個結果,但這並不是我想要的:

summary_table <- df %>% count(Criteria1,Criteria2,Criteria3,TestResult)

  Criteria1        Criteria2   Criteria3         TestResult    n
1 Include          Exclude      Exclude          Negative      31
2 Include          Exclude      Exclude          Positive      10
3 Include          Exclude      Include          Negative      3
4 Include          Exclude      Include          Positive      2
5 Include          Include      Exclude          Negative      41
6 Include          Include      Exclude          Positive      16
7 Include          Include      Include          Negative      5
8 Include          Include      Include          Positive      2

預先感謝您的幫助!

您可以將 pivot 多個列合並為一個鍵和值列。 這允許您計算行組:

library(tidyverse)

data <- tibble::tribble(
  ~Patient.ID, ~Criteria1, ~Criteria2, ~Criteria3, ~TestResult,
  1L, "Include", "Include", "Exclude", "Positive",
  2L, "Include", "Exclude", "Exclude", "Negative",
  3L, "Include", "Exclude", "Exclude", "Positive",
  4L, "Include", "Exclude", "Exclude", "Negative",
  5L, "Include", "Include", "Exclude", "Negative",
  6L, "Include", "Exclude", "Exclude", "Positive",
  7L, "Include", "Exclude", "Exclude", "Negative",
  8L, "Include", "Exclude", "Exclude", "Negative",
  9L, "Include", "Include", "Exclude", "Positive",
  10L, "Include", "Include", "Exclude", "Positive"
)

counts <-
  data %>%
  pivot_longer(starts_with("Criteria")) %>%
  count(name, value, TestResult, name = "Count")
counts
#> # A tibble: 8 × 4
#>   name      value   TestResult Count
#>   <chr>     <chr>   <chr>      <int>
#> 1 Criteria1 Include Negative       5
#> 2 Criteria1 Include Positive       5
#> 3 Criteria2 Exclude Negative       4
#> 4 Criteria2 Exclude Positive       2
#> 5 Criteria2 Include Negative       1
#> 6 Criteria2 Include Positive       3
#> 7 Criteria3 Exclude Negative       5
#> 8 Criteria3 Exclude Positive       5

counts %>%
  ggplot(aes(name, Count, fill = TestResult)) +
    geom_col()

reprex package (v2.0.1) 創建於 2022-04-15

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM