R 中的桑基圖標簽

Question

背景

我正在 R 中創建一個桑基圖，我正在努力標記節點。

例如，我將重用一個包含 10 名假想患者的數據集，這些患者接受了 COVID-19 篩查。 在基線時，所有患者的 COVID-19 均為陰性。 假設 1 周后，所有患者再次接受檢測：現在，3 名患者為陽性，6 名患者為陰性，1 名患者結果不確定。 又過了一周，3名陽性患者仍為陽性，1名患者由陰性轉為陽性，其他患者均為陰性。

data <- data.frame(patient = 1:10, 
                   baseline = rep("neg", 10), 
                   test1 = c(rep("pos",3), rep("neg", 6), "inconcl"), 
                   test2 = c( rep(NA, 3), "pos", rep("neg", 6) ))

試圖

要創建 Sankey 圖，我使用的是ggsankey package ：

library(tidyverse)
#devtools::install_github("davidsjoberg/ggsankey")
df <- data %>%
  make_long(baseline, test1, test2)

ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node,
               fill = factor(node), label = node)) +
  geom_sankey() +
  geom_sankey_label(aes(fill = factor(node)), size = 3, color = "white") +
  scale_fill_manual(values = c("grey", "green", "red")) +
  theme(legend.position = "bottom", legend.title = element_blank())

問題

我想 label 每個nodes中存在患者數量的節點（例如，第一個節點將標記為10 ， inconclusive的節點將標記為1 ，依此類推......）。

您如何在 R 中執行此操作而不對值進行硬編碼？

部分解決方案

為了從數據中提取數字，我認為第一步應該是這樣的：

data %>% count(baseline, test1, test2)
#  baseline   test1 test2 n
#1      neg inconcl   neg 1
#2      neg     neg   neg 5
#3      neg     neg   pos 1
#4      neg     pos  <NA> 3

我認為，如果我能夠在長數據df的額外列中包含正確的值，我應該能夠從美學中調用label=variable_name嗎？

Answer 1

嘗試這個：

library(ggplot2)
library(ggsankey)
library(dplyr)


# create a count data frame for each node

df_nr <- 
  df %>% 
  filter(!is.na(node)) %>% 
  group_by(x, node)%>% 
  summarise(count = n())
#> `summarise()` has grouped output by 'x'. You can override using the `.groups` argument.

# join to sankey dataframe

df <- 
  df %>% 
  left_join(df_nr)




ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node,
               fill = factor(node))) +
  geom_sankey() +
  geom_sankey_label(aes(label = node), size = 3, color = "white") +
  geom_sankey_text(aes(label = count), size = 3.5, vjust = -1.5, check_overlap = TRUE) +
  scale_fill_manual(values = c("grey", "green", "red")) +
  theme_minimal()+
  theme(legend.position = "bottom",
        legend.title = element_blank())

數據

data <- data.frame(patient = 1:10, 
                   baseline = rep("neg", 10), 
                   test1 = c(rep("pos",3), rep("neg", 6), "inconcl"), 
                   test2 = c( rep(NA, 3), "pos", rep("neg", 6) ))
df <- data %>%
  make_long(baseline, test1, test2)

如果你想要一個邊界框，你可以調整計數 label 的位置或將其更改為 label（不太確定這是否有效）。 不確定geom_sankey_label是否識別check_overlap以避免計數文本的多次重疊。

^{由代表 package (v2.0.0) 於 2021 年 4 月 20 日創建}

R 中的桑基圖標簽

問題描述

1 個解決方案

解決方案1
2 已采納 2021-04-20 15:21:56

R 中的桑基圖標簽

問題描述

1 個解決方案

解決方案1 2 已采納 2021-04-20 15:21:56

解決方案1
2 已采納 2021-04-20 15:21:56