简体   繁体   English

使用桑基图通过 ggplot 查看 R ggalluvial 和化妆品中的数据流

[英]Using Sankey plot to see data flow in R ggalluvial and cosmetics by ggplot

I have a data table of patient clusters before (consensus) and after treatments (single drug) and I want to show how patients flows into different clusters before and after treatment.我有一个治疗前(共识)和治疗后(单一药物)患者集群的数据表,我想展示患者在治疗前后如何流入不同的集群。 In this case the actual cluster number doesn't mean much, the important bit is that for most patients cluster together before treatment also end up together after the treatment.在这种情况下,实际的簇数没有多大意义,重要的一点是,对于大多数患者来说,治疗前聚集在一起,治疗后也聚集在一起。 Some moves around.一些移动。

Here is a screenshot of the data这是数据的截图在此处输入图片说明

dummy dataset 

structure(list(Stimulation = c("3S", "3S", "3S", "3S", "3S", 
"3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", 
"3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", 
"3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", 
"3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", 
"3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", 
"3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S", "3S"), Patient.ID =       c("S3077497", 
"S1041120", "S162465", "S563275", "S2911623", "S3117192", "S2859024", 
"S2088278", "S3306185", "S190789", "S12146451", "S2170842", "S115594", 
"S2024203", "S1063872", "S2914138", "S303984", "S570813", "S2176683", 
"S820460", "S1235729", "S3009401", "S2590229", "S629309", "S1208256", 
"S2572773", "S3180483", "S3032079", "S3217608", "S5566943",     "S5473728", 
"S104259", "S2795346", "S2848989", "S2889801", "S2813983", "S2528246", 
"S3151923", "S2592908", "S2603793", "S5565867", "S3127064", "S675629", 
"S834679", "S3011944", "S5011583", "S2687896", "S2998620", "S651963", 
"S2104595", "S2433454", "S2565220", "S3307762", "S294778", "S995510", 
"S2476822", "S140868", "S1018263", "S2990223", "S5524130", "S1042529", 
"S999706", "S363003", "S2303087", "S868213", "S5568359", "S3174542", 
"S521782", "S3294727"), `Cluster assigned consensus` = c(2, 2, 
2, 2, 2, 5, 5, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 4, 3, 7, 4, 4, 4, 
4, 4, 4, 8, 8, 4, 7, 4, 1, 1, 1, 1, 1, 1, 1, 8, 8, 8, 8, 7, 7, 
7, 7, 7, 3, 7, 6, 6, 6, 6, 6, 8, 7, 7, 5, 7, 5, 7, 7, 7, 8, 8, 
4, 7, 4, 7), `Cluster assigned single drug` = c("1", "1", "1", 
"1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "2", "2", "2", 
"2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3", "4", "4", 
"4", "4", "5", "5", "5", "5", "5", "5", "5", "6", "6", "6", "6", 
"6", "6", "6", "6", "6", "6", "6", "7", "7", "7", "7", "7", "7", 
"7", "7", "8", "8", "8", "8", "8", "8", "8", "8", "8", "8", "8", 
"8"), count = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -69L), class =     c("tbl_df", 
"tbl", "data.frame"))

I'm first time getting to sankey plot so I 'm no expert.我是第一次接触桑基情节,所以我不是专家。 I added the count column, so each patient has a count of 1, the flow thickness can be then added by the count.我添加了计数列,因此每个患者的计数为 1,然后可以通过计数添加流量厚度。

I modified from R tutorial and the code to visualise is here我从 R 教程修改,可视化的代码在这里

library(ggplot2)
library(ggalluvial)

ggplot(data = CLL3S,
       aes(axis1 = `Cluster assigned consensus`, axis2 = `Cluster assigned single drug`, y = count)) +
  scale_x_discrete(limits = c("Consensus cluster", "Single-drug cluster"), expand = c(.1, .1)) +
  xlab("Clusters") +
  geom_alluvium(aes(fill = `Cluster assigned consensus`)) +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  theme_minimal() +
  ggtitle("Patient flow between the Consensus clusters and Single-drug treated clusters",
          "3S stimulated patients")

This kind of works but the figure isn't pretty:这种工作,但数字不漂亮:

在此处输入图片说明

You see the cluster numbers are surrounded by huge white empty boxes.您会看到簇号被巨大的白色空框包围。 How can I change that to something smaller?我怎样才能把它改成更小的东西? And how do I color code the box into different colors and make sure the if I change the geom_alluvium (fill) so the flow of the data matches the color of the boxes(consensus boxes)?以及如何将框的颜色编码为不同的颜色,并确保我是否更改了 geom_alluvium(填充),以便数据流与框(共识框)的颜色相匹配?

You control that in geom_stratum.您可以在 geom_stratum 中控制它。 Try this尝试这个

library(ggplot2)
library(ggalluvial)
library(RColorBrewer)

# Define the number of colors you want
nb.cols <- 10
mycolor1 <- colorRampPalette(brewer.pal(8, "Set2"))(nb.cols)
mycolor2 <- colorRampPalette(brewer.pal(2, "Set2"))(nb.cols)

mycolors <- c("red","blue","green","orange")

ggplot(data = CLL3S,
       aes(y = count, axis1 = `Cluster assigned consensus`, axis2 = `Cluster assigned single drug` 
           )) +
  scale_x_discrete(limits = c("Consensus cluster", "Single-drug cluster"), expand = c(.1, .1)) +
  labs(x="Clusters") +
  geom_alluvium(aes(fill = `Cluster assigned consensus`)) +
  geom_stratum(width = 1/4, fill = c(mycolor1[1:8],mycolor1[1:8]), color = "red") +
  #geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  #scale_fill_manual(values = mycolors) +
  theme_minimal() +
  guides(fill=guide_legend(override.aes = list(color=mycolors)))+
  ggtitle("Patient flow between the Consensus clusters and Single-drug treated clusters",
          "3S stimulated patients")

输出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM