简体   繁体   中英

How to draw network diagram from data frame columns in R?

I have a data frame of customers. I want to draw a customer stages as network diagram. Sample data is like below.

cust_id     checkin time           stage2                     stage3              checkout time
12345   2019-01-01 07:02:50     2019-01-01 07:23:25        2019-01-01 07:23:22  2019-01-01 08:37:43
56789   2019-01-01 07:25:21     2019-01-01 07:35:29        2019-01-01 07:35:27  2019-01-01 09:36:06
43256   2019-01-01 07:27:22     2019-01-01 07:42:49        NA                   2019-01-01 09:34:55
34567   2019-01-01 07:22:15     2019-01-01 08:25:35        2019-01-01 07:26:02  2019-01-01 09:00:40
89765   2019-01-01 08:29:35     2019-01-01 08:30:58        NA                   2019-01-01 09:02:48
23456   2019-01-01 08:54:12     2019-01-01 09:18:46        2019-01-01 09:08:34  2019-01-01 09:46:38

The raw data is look like above. There is no rule for customer ie, Some of the customers checkout after stage2 and some of the customers has to go stage 3 and checkout after stage 3.

Basically, I want to draw network map of the cusomers stages like below:

checkin > stage2 > stage3 > checkout
             |
            checkout

How to do that in R?
Tried like below with networkD3 package:

library(igraph)
library(networkD3)
p <- simpleNetwork(df, height="100px", width="100px",        
                   Source = 1,                 # column number of source
                   Target = 5,                 # column number of target
                   linkDistance = 10,          # distance between node. Increase this value to have more space between nodes
                   charge = -900,                # numeric value indicating either the strength of the node repulsion (negative value) or attraction (positive value)
                   fontSize = 14,               # size of the node names
                   fontFamily = "serif",       # font og node names
                   linkColour = "#666",        # colour of edges, MUST be a common colour for the whole graph
                   nodeColour = "#69b3a2",     # colour of nodes, MUST be a common colour for the whole graph
                   opacity = 0.9,              # opacity of nodes. 0=transparent. 1=no transparency
                   zoom = T                    # Can you zoom on the figure?
)

p

Please, help me to find the way to it.

I've found the DiagrammeR package useful. Converting your sample data to the formats used by Diagrammer would be awkward, so I've done it manually.

library(DiagrammeR)

# Manually represent your data as nodes and edges
nodes <- create_node_df(n=5, label=c("Check in", "Stage 1", "Stage 2", "Stage 3", "Check out"))
edges <- create_edge_df(from = c(1, 2, 3), to = c(2, 3, 4))
lastStage <- c(4, 4, 3, 4, 3, 3)

# Create the base graph
graph <- create_graph(nodes_df=nodes, edges_df=edges) 

# Produce the customer graphs
networks <- lapply(lastStage, function(x) graph %>% add_edge(from=x, to=5) %>% render_graph())
networks[[2]]

Giving, as an example,

图表输出

You have considerable control over the appearance of the graph. The DiagrammeR home page is here .

here's one solution using networkD3 ...

library(tidyverse)
library(lubridate)
library(networkD3)

data <- 
  tribble(
  ~cust_id, ~checkin.time,         ~stage2,               ~stage3,               ~checkout.time,
  12345,    "2019-01-01 07:02:50", "2019-01-01 07:23:25", "2019-01-01 07:23:22", "2019-01-01 08:37:43",
  56789,    "2019-01-01 07:25:21", "2019-01-01 07:35:29", "2019-01-01 07:35:27", "2019-01-01 09:36:06",
  43256,    "2019-01-01 07:27:22", "2019-01-01 07:42:49", NA,                    "2019-01-01 09:34:55",
  34567,    "2019-01-01 07:22:15", "2019-01-01 08:25:35", "2019-01-01 07:26:02", "2019-01-01 09:00:40",
  89765,    "2019-01-01 08:29:35", "2019-01-01 08:30:58", NA,                    "2019-01-01 09:02:48",
  23456,    "2019-01-01 08:54:12", "2019-01-01 09:18:46", "2019-01-01 09:08:34", "2019-01-01 09:46:38"
  ) %>% 
  mutate(across(!cust_id, ~ymd_hms(.x, tz = "UTC")))

data %>% 
  select(-cust_id) %>% 
  mutate(across(.fns = ~if_else(is.na(.x), NA_character_, cur_column()))) %>% 
  mutate(row = row_number()) %>%
  mutate(origin = .[[1]]) %>%
  gather("column", "source", -row, -origin) %>%
  mutate(column = match(column, names(data))) %>%
  filter(!is.na(source)) %>% 
  arrange(row, column) %>%
  group_by(row) %>%
  mutate(target = lead(source)) %>%
  ungroup() %>%
  filter(!is.na(source) & !is.na(target)) %>%
  mutate(target = if_else(target == "checkout.time", paste0(target, " from ", source), target)) %>% 
  select(source, target, origin) %>%
  group_by(source, target, origin) %>%
  summarise(count = n()) %>%
  ungroup() %>%
  simpleNetwork()

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM