简体   繁体   中英

How to plot an exploratory decision tree in R

Let's assume that a group of people is followed during time and at 3 time points they were asked if they would like become judge or not. During the time they will change their opinion. I would like to show graphically the change of opinion to become judge/not judge during time. Here is an idea how it could be shown:

在此输入图像描述

Here is how to read the plot:

  • 1,462 student were sampled and (400+295+22+147) of these would like to become judge (first bunch of lines upwards).
  • Blue path means that at the end they become judge.
  • Black path means that at the end they did something else.
  • Line goes up: they want to become judge.
  • Line goes down: they don't want to become judge.
  • Thickness of the lines is proportional to the number of person who went through this specific path (=number plotted at the end of the path).

For example:
(a) 118 person didn't want to become judge during high school and university but during practice they decided to become judge.
(b) Until practice 695 decided to become judge but after practice 400 become judge and 295 did something else.

The main idea is to explore which kind of decision path exists and which are the most used.

I have several question:

  1. Is there a name for this kind of graph?
  2. Is there already an R-function which can plot this graph?
  3. If there is no R-function: any idea how I can plot this prettier? For example: (3.1) I would like to have the curve adjacent (without gap between the curves and without overlapping). (3.2) Start and end of the curves should be parallel to the y-axis.

Any suggestions?

Edit 1:
I found a plot which is similar to the one above: riverplot, see for example, R library riverplot or R blogger . The drawback of riverplot is that at the crossing points the individual threads or pathes are lost.


Here are the data:

library(reshape2)
library(ggplot2)

# Data
wide <- data.frame(  grp        = 1:8,
                    time1_orig = rep(8,8)
                  , time2_orig = rep(c(4,12), each = 4)
                  , time3_orig = rep(c(2,6,10,14), each = 2)
                  , time4_orig = seq(1,15,2)
                  , n           = c(409,118,38,33,147,22,295,400)  # number of persion
                  , d           = c(1,0,1,0,1,0,1,0)               # decision
                  )

wide
  grp time1_orig time2_orig time3_orig time4_orig   n d
1   1          8          4          2          1 409 1
2   2          8          4          2          3 118 0
3   3          8          4          6          5  38 1
4   4          8          4          6          7  33 0
5   5          8         12         10          9 147 1
6   6          8         12         10         11  22 0
7   7          8         12         14         13 295 1
8   8          8         12         14         15 400 0

What follows are transformation of the data to get the plot:

w <- 500
wide$time1 <- wide$time1_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time2 <- wide$time2_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time3 <- wide$time3_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time4 <- wide$time4_orig + (cumsum(wide$n)-(wide$n)/2)/w


long<- melt(wide[,-c(2:5)], id = c("d","grp","n"))
long$d<-as.character(long$d)
str(long)

And here is the ggplot:

gg1 <- ggplot(long, aes(x=variable, y=value, group=grp, colour=d)) +
          geom_line (aes(size=n),position=position_dodge(height=c(0.5))) +
          geom_text(aes(label=c( "1462",""   ,""   ,""   ,""   ,""   ,""   ,""
                                ,""    ,""   ,"598",""   ,""   ,"864",""   ,""
                                ,"527" ,""   ,""   ,"71" ,"169",""   ,""   ,"695"
                            ,"409" ,"118","38" ,"33" ,"147","22" ,"295","400"
                            )
                        , size = 300, vjust= -1.5)
                    ) +
           scale_colour_manual(name="",labels=c("Yes", "No"),values=c("royalblue","black")) +
           theme(legend.position = c(0,1),legend.justification = c(0, 1),
                 legend.text = element_text( size=12),
                 axis.text = element_text( size=12),
                 axis.title = element_text( size=15),
                 plot.title = element_text( size=15)) +
           guides(lwd="none") +
           labs(x="", y="Consider a judge career as an option:") +
           scale_y_discrete(labels="") +
           scale_x_discrete(labels = c(  "during high school"
                                       , "during university"
                                       , "during practice"
                                       , ""
                                    )
                                )
gg1

I found a solution thanks to library riverplot which gives me this plot:

在此输入图像描述

Here is the code:

library("riverplot")
# Create nodes
nodes <- data.frame(  ID     = paste(rep(c("O","C","R","D"),c(1,2,4,8)),c(1,1:2,1:4,1:8),sep="")
                    , x      = rep(0:3, c(1,2,4,8)) 
                    , y      = c(8, 12,4,14, 10,6,2, 15,13,11,9,7,5,3,1)
                    , labels = c("1462","864","598","695","169","71","527","400","295","22","147","33","38","118","409")
                    , col    = rep("lightblue", 15)
                    , stringsAsFactors= FALSE
                    )
# Create edges
edges <- data.frame(  N1 = paste(rep(c("O","C","R"), c(2,4,8)), rep(c(1,2,1,4:1)  , each=2), sep="")
                    , N2 = paste(rep(c("C","R","D"), c(2,4,8)), c(c(2:1,4:1,8:1)), sep="")
                    )

edges$Value   <- as.numeric(nodes$labels[2:15])
edges$col     <- NA
edges$col     <- rep(c("black","royalblue"), 7)
edges$edgecol <- "col"

# Create nodes/edges object
river <- makeRiver(nodes, edges)

# Define styles
style <-default.style()
style[["edgestyle"]]<-"straight"

# Plot
plot(river, default_style= style, srt=0, nsteps=200, nodewidth = 3)

# Add label
names <- data.frame (Time = c(" ", "during high school", "during university", "during practive")
                     ,hi  = c(0,0,0,0)
                     ,wi  = c(0,1,2,3)
                     )
with( names, text( wi, hi, Time) )

There is an alternative to plot a sequence of categorical information:
TraMineR - Mining sequence data

TraMineR: a toolbox for exploring sequence data
TraMineR is a R-package for mining, describing and visualizing sequences of states or events, and more generally discrete sequential data. Its primary aim is the analysis of biographical longitudinal data in the social sciences, such as data describing careers or family trajectories. Most of its features also apply, however, to non temporal data such as text or DNA sequences for instance

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM