简体   繁体   English

R 从分组数据框到桑基图

[英]R from grouped dataframe to Sankey diagram

I've been spending most of yesterdays time on the following problem and haven't found a solution yet to the following problem:我昨天大部分时间都在解决以下问题,但尚未找到解决以下问题的方法:

I have a dataframe with categorical data: say category1: has values A and B;我有一个包含分类数据的数据框:比如 category1:具有值 A 和 B; Antother column category2 has values C, D, F, G;另一列 category2 具有值 C、D、F、G; category3 has values H and so on... category3 有值 H 等等...

I want to make a Sankey diagram showing how many (through the widths of the bands from node to node) from category1 A are in C, D, F, G. And this for all other combinations in the grouped dataframe as well.我想制作一个桑基图,显示类别 1 A 中有多少(通过从节点到节点的波段宽度)在 C、D、F、G 中。这也适用于分组数据帧中的所有其他组合。

It's basically a tree with the width of the branches showing how many counts are in the particular branch.它基本上是一棵树,树枝的宽度显示了特定树枝中有多少计数。

Is there a way on how to do this in a flexible way so that it works for most groupings in categorical DF's?有没有办法以灵活的方式执行此操作,以便它适用于分类 DF 中的大多数分组?

You can try with the nice ggalluvial package:您可以尝试使用漂亮的 ggalluvial 包:

library(ggalluvial)
library(ggplot2)

# some fake data
data <- data.frame(column1 = c('A','A','A','B','B','B')
                   ,column2 = c('C','D','E','C','D','E')
                   , column3 = c('F','G','H','I','J','K')
                               )

# add a costant as frequencies: if each "flow" count as 1, you can do this
data$freq <- 1

# here the plot
ggplot(data,
       aes(y = freq, axis1 = column1, axis2 = column2, axis3 = column3)) +
  geom_alluvium(aes(), width = 1/12) +
  geom_stratum(width = 1/12, fill = "black", color = "blue") +
  geom_label(stat = "stratum", label.strata = TRUE)  +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  ggtitle("nice sankey")

在此处输入图片说明

If you're willing to rearrange you're data into a node list and an edge list, you can take advantage of the D3 javascript library with the networkD3 package.如果您愿意将数据重新排列为节点列表和边列表,则可以利用带有networkD3包的 D3 javascript 库。 Here's an example with dummy data (note that to use this library you need to have an id column which starts with 0.这是一个带有虚拟数据的示例(请注意,要使用此库,您需要有一个以 0 开头的 id 列。

library(tidyverse)

nodes <- tibble(id = c(0:9), label = c(1:10))

edges <- tibble(from = c(5:15, 0:4, 16:19), to = (0:19), weight = rnorm(20))

library(networkD3)

sankeyNetwork(Links = edges, 
              Nodes = nodes, 
              Source = "from", 
              Target = "to", 
              NodeID = "label", 
              Value = "weight")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM