R 中的桑基图标签

Question

Background背景

I am creating a Sankey Diagram in R and I am struggling with labeling the nodes.我正在 R 中创建一个桑基图，我正在努力标记节点。

As example, I will reuse a dataset with 10 imaginary patients that are screened for COVID-19.例如，我将重用一个包含 10 名假想患者的数据集，这些患者接受了 COVID-19 筛查。 At baseline, all patients are negative for COVID-19.在基线时，所有患者的 COVID-19 均为阴性。 After let's say 1 week, all patients are tested again: now, 3 patients are positive, 6 are negative and 1 has an inconclusive result.假设 1 周后，所有患者再次接受检测：现在，3 名患者为阳性，6 名患者为阴性，1 名患者结果不确定。 Yet another week later, the 3 positive patients remain positive, 1 patient goes from negative to positive, and the others are negative.又过了一周，3名阳性患者仍为阳性，1名患者由阴性转为阳性，其他患者均为阴性。

data <- data.frame(patient = 1:10, 
                   baseline = rep("neg", 10), 
                   test1 = c(rep("pos",3), rep("neg", 6), "inconcl"), 
                   test2 = c( rep(NA, 3), "pos", rep("neg", 6) ))

Attempt试图

To create the Sankey diagram, I am using the ggsankey package :要创建 Sankey 图，我使用的是ggsankey package ：

library(tidyverse)
#devtools::install_github("davidsjoberg/ggsankey")
df <- data %>%
  make_long(baseline, test1, test2)

ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node,
               fill = factor(node), label = node)) +
  geom_sankey() +
  geom_sankey_label(aes(fill = factor(node)), size = 3, color = "white") +
  scale_fill_manual(values = c("grey", "green", "red")) +
  theme(legend.position = "bottom", legend.title = element_blank())

Question问题

I would like to label the nodes with the number of patients that are present in each node (eg, the first node would be labeled as 10 , and the inconclusive node would be labeled as 1 , and so on...).我想 label 每个nodes中存在患者数量的节点（例如，第一个节点将标记为10 ， inconclusive的节点将标记为1 ，依此类推......）。

How do you do this in R without hardcoding the values?您如何在 R 中执行此操作而不对值进行硬编码？

Parts of solution部分解决方案

To extract the numbers from the data, I thought the initial step should be something like:为了从数据中提取数字，我认为第一步应该是这样的：

data %>% count(baseline, test1, test2)
#  baseline   test1 test2 n
#1      neg inconcl   neg 1
#2      neg     neg   neg 5
#3      neg     neg   pos 1
#4      neg     pos  <NA> 3

I think that if I am able to include the proper values in an extra column of the long data df , I should be able to call label=variable_name from the aesthetics?我认为，如果我能够在长数据df的额外列中包含正确的值，我应该能够从美学中调用label=variable_name吗？

Answer 1

Try this:尝试这个：

library(ggplot2)
library(ggsankey)
library(dplyr)


# create a count data frame for each node

df_nr <- 
  df %>% 
  filter(!is.na(node)) %>% 
  group_by(x, node)%>% 
  summarise(count = n())
#> `summarise()` has grouped output by 'x'. You can override using the `.groups` argument.

# join to sankey dataframe

df <- 
  df %>% 
  left_join(df_nr)




ggplot(df, aes(x = x, next_x = next_x, node = node, next_node = next_node,
               fill = factor(node))) +
  geom_sankey() +
  geom_sankey_label(aes(label = node), size = 3, color = "white") +
  geom_sankey_text(aes(label = count), size = 3.5, vjust = -1.5, check_overlap = TRUE) +
  scale_fill_manual(values = c("grey", "green", "red")) +
  theme_minimal()+
  theme(legend.position = "bottom",
        legend.title = element_blank())

data数据

data <- data.frame(patient = 1:10, 
                   baseline = rep("neg", 10), 
                   test1 = c(rep("pos",3), rep("neg", 6), "inconcl"), 
                   test2 = c( rep(NA, 3), "pos", rep("neg", 6) ))
df <- data %>%
  make_long(baseline, test1, test2)

You can adjust the placement of the count label or change it to label if you want a bounding box (not so sure this works so well).如果你想要一个边界框，你可以调整计数 label 的位置或将其更改为 label（不太确定这是否有效）。 Not sure if geom_sankey_label recognises check_overlap to avoid multiple overlaps of the count text.不确定geom_sankey_label是否识别check_overlap以避免计数文本的多次重叠。

^{Created on 2021-04-20 by the reprex package (v2.0.0)}^{由代表 package (v2.0.0) 于 2021 年 4 月 20 日创建}

R 中的桑基图标签

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-04-20 15:21:56

R 中的桑基图标签

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-04-20 15:21:56

解决方案1
2 已采纳 2021-04-20 15:21:56