簡體   English   中英

在 R 中創建 Sankey 圖; 使 plot output 可解釋

[英]Creating Sankey diagram in R; making the plot output interpretable

我第一次在 R 中創建了桑基圖,顯示了前件和后件事件之間的聯系以及它們發生的次數。 這是我正在使用的數據類型的模擬示例:-

#df creation=====================================================

df<-structure(list(Antecedent = c("Activity 1", "Activity 1", "Activity 1", 
                                  "Activity 1", "Activity 1", "Activity 2", "Activity 2", "Activity 2", 
                                  "Activity 2", "Activity 2", "Activity 3", "Activity 3", "Activity 3", 
                                  "Activity 3", "Activity 3", "Activity 4", "Activity 4", "Activity 4", 
                                  "Activity 4", "Activity 4", "Activity 5", "Activity 5", "Activity 5", 
                                  "Activity 5", "Activity 5"), 
                   Consequent = c("Activity 1", "Activity 2", 
                   "Activity 3", "Activity 4", "Activity 5", "Activity 1", "Activity 2", 
                   "Activity 3", "Activity 4", "Activity 5", "Activity 1", "Activity 2", 
                   "Activity 3", "Activity 4", "Activity 5", "Activity 1", "Activity 2", 
                   "Activity 3", "Activity 4", "Activity 5", "Activity 1", "Activity 2", 
                   "Activity 3", "Activity 4", "Activity 5"), 
                   count = c(1694888L,170L, 4060L, 0L, 7L, 255L, 46564L, 756L, 38L, 43L, 3926L, 523L, 
                                      303979L, 689L, 711L, 0L, 51L, 670L, 35210L, 383L, 13L, 59L, 800L, 
                                      508L, 14246L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
                                      -25L))

這是我用來整理數據以使其符合networkD3庫中的桑基圖 function 的代碼。

#libraries========================================
library(dplyr)
library(networkD3)


# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(
  name=c(as.character(df$Antecedent),
         as.character(df$Consequent)) %>% unique()
)



# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
df$IDsource <- match(df$Antecedent, nodes$name)-1
df$IDtarget <- match(df$Consequent, nodes$name)-1



# Make the Network
p <- sankeyNetwork(Links = df, Nodes = nodes,
                   Source = "IDsource", Target = "IDtarget",
                   Value = "count", NodeID = "name",units = "%")
p

但是如果給我一個 plot 看起來很糟糕而且幾乎無法解釋:-

在此處輸入圖像描述

我希望我能得到類似於下面鏈接中的示例(這是我找到代碼的地方):-

最基本的桑基圖

而且我還是想實現這種output。 我認為最明顯的問題是我的df中的AntecedentConsequent變量的命名約定,因為它們是相同的。

但我想知道是否還有一種方法(不改變我的df中的命名約定)來創建一個類似於我在上面附加的鏈接中的桑基圖。 有人可以提供解決方案嗎? 非常感謝!

如果你想堅持使用networkD3,我認為你需要消除節點名稱的歧義,以避免結果圖中的循環。

library(dplyr)
library(networkD3)

# Disambiguate node names
links <- df %>% 
  mutate(
    Antecedent = paste("Antecedent", Antecedent),
    Consequent = paste("Consequent", Consequent),
  )

# Create a data frame for nodes
nodes <- links %>% 
  summarise(name = union(Antecedent, Consequent))

# Find node IDs for links
links$IDsource <- match(links$Antecedent, nodes$name) - 1
links$IDtarget <- match(links$Consequent, nodes$name) - 1

sankeyNetwork(
  Links = links,
  Nodes = nodes,
  Source = "IDsource",
  Target = "IDtarget",
  Value = "count",
  NodeID = "name"
) -> p
#> Links is a tbl_df. Converting to a plain data frame.
#> Nodes is a tbl_df. Converting to a plain data frame.

network3D sankey網絡快照

或者,您可以使用 ggplot2 和 ggforce 來創建 static 圖。 它還需要一些預處理才能以正確的格式獲取數據:

library(ggplot2)

df %>% 
  ggforce::gather_set_data(1:2) %>% 
  ggplot(aes(x, split = y, id = id, value = count)) +
    ggforce::geom_parallel_sets(aes(fill = Antecedent)) +
    ggforce::geom_parallel_sets_axes(axis.width = 0.05) +
    ggforce::geom_parallel_sets_labels(
      angle = 0,
      hjust = 0,
      position = position_nudge(0.05)
    )

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM