简体   繁体   English

如何在R中绘制探索性决策树

[英]How to plot an exploratory decision tree in R

Let's assume that a group of people is followed during time and at 3 time points they were asked if they would like become judge or not. 让我们假设一群人在时间和3个时间点被跟踪,他们被问到是否想成为法官。 During the time they will change their opinion. 在此期间,他们将改变他们的意见。 I would like to show graphically the change of opinion to become judge/not judge during time. 我希望以图形方式显示意见的变化,以便在时间内成为判断/不判断。 Here is an idea how it could be shown: 这是一个如何显示它的想法:

在此输入图像描述

Here is how to read the plot: 以下是如何阅读情节:

  • 1,462 student were sampled and (400+295+22+147) of these would like to become judge (first bunch of lines upwards). 对1,462名学生进行了抽样调查,并且(400 + 295 + 22 + 147)这些学生想成为法官(第一批线向上)。
  • Blue path means that at the end they become judge. 蓝色路径意味着最终他们成为法官。
  • Black path means that at the end they did something else. 黑色路径意味着最后他们做了别的事情。
  • Line goes up: they want to become judge. 线路上升:他们想成为法官。
  • Line goes down: they don't want to become judge. 线路下降:他们不想成为法官。
  • Thickness of the lines is proportional to the number of person who went through this specific path (=number plotted at the end of the path). 线条的粗细与经过此特定路径的人数成正比(=在路径末端绘制的数字)。

For example: 例如:
(a) 118 person didn't want to become judge during high school and university but during practice they decided to become judge. (a)118人不想在高中和大学期间成为法官,但在实践中他们决定成为法官。
(b) Until practice 695 decided to become judge but after practice 400 become judge and 295 did something else. (b)直到练习695决定成为法官,但在练习后400成为法官,295做了其他事情。

The main idea is to explore which kind of decision path exists and which are the most used. 主要思想是探索哪种决策路径存在以及哪种决策路径最常用。

I have several question: 我有几个问题:

  1. Is there a name for this kind of graph? 这种图表有名称吗?
  2. Is there already an R-function which can plot this graph? 是否已有可以绘制此图形的R函数?
  3. If there is no R-function: any idea how I can plot this prettier? 如果没有R功能:任何想法如何绘制这个更漂亮的? For example: (3.1) I would like to have the curve adjacent (without gap between the curves and without overlapping). 例如:(3.1)我希望曲线相邻(曲线之间没有间隙,没有重叠)。 (3.2) Start and end of the curves should be parallel to the y-axis. (3.2)曲线的起点和终点应与y轴平行。

Any suggestions? 有什么建议么?

Edit 1: 编辑1:
I found a plot which is similar to the one above: riverplot, see for example, R library riverplot or R blogger . 我发现了一个类似于上图的情节:河图,例如,参见R library riverplotR blogger The drawback of riverplot is that at the crossing points the individual threads or pathes are lost. 河流图的缺点是在交叉点处,各个线程或路径都会丢失。


Here are the data: 以下是数据:

library(reshape2)
library(ggplot2)

# Data
wide <- data.frame(  grp        = 1:8,
                    time1_orig = rep(8,8)
                  , time2_orig = rep(c(4,12), each = 4)
                  , time3_orig = rep(c(2,6,10,14), each = 2)
                  , time4_orig = seq(1,15,2)
                  , n           = c(409,118,38,33,147,22,295,400)  # number of persion
                  , d           = c(1,0,1,0,1,0,1,0)               # decision
                  )

wide
  grp time1_orig time2_orig time3_orig time4_orig   n d
1   1          8          4          2          1 409 1
2   2          8          4          2          3 118 0
3   3          8          4          6          5  38 1
4   4          8          4          6          7  33 0
5   5          8         12         10          9 147 1
6   6          8         12         10         11  22 0
7   7          8         12         14         13 295 1
8   8          8         12         14         15 400 0

What follows are transformation of the data to get the plot: 接下来是数据转换得到的情节:

w <- 500
wide$time1 <- wide$time1_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time2 <- wide$time2_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time3 <- wide$time3_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time4 <- wide$time4_orig + (cumsum(wide$n)-(wide$n)/2)/w


long<- melt(wide[,-c(2:5)], id = c("d","grp","n"))
long$d<-as.character(long$d)
str(long)

And here is the ggplot: 这是ggplot:

gg1 <- ggplot(long, aes(x=variable, y=value, group=grp, colour=d)) +
          geom_line (aes(size=n),position=position_dodge(height=c(0.5))) +
          geom_text(aes(label=c( "1462",""   ,""   ,""   ,""   ,""   ,""   ,""
                                ,""    ,""   ,"598",""   ,""   ,"864",""   ,""
                                ,"527" ,""   ,""   ,"71" ,"169",""   ,""   ,"695"
                            ,"409" ,"118","38" ,"33" ,"147","22" ,"295","400"
                            )
                        , size = 300, vjust= -1.5)
                    ) +
           scale_colour_manual(name="",labels=c("Yes", "No"),values=c("royalblue","black")) +
           theme(legend.position = c(0,1),legend.justification = c(0, 1),
                 legend.text = element_text( size=12),
                 axis.text = element_text( size=12),
                 axis.title = element_text( size=15),
                 plot.title = element_text( size=15)) +
           guides(lwd="none") +
           labs(x="", y="Consider a judge career as an option:") +
           scale_y_discrete(labels="") +
           scale_x_discrete(labels = c(  "during high school"
                                       , "during university"
                                       , "during practice"
                                       , ""
                                    )
                                )
gg1

I found a solution thanks to library riverplot which gives me this plot: 我找到了一个解决方案,感谢图书riverplot ,它给了我这个情节:

在此输入图像描述

Here is the code: 这是代码:

library("riverplot")
# Create nodes
nodes <- data.frame(  ID     = paste(rep(c("O","C","R","D"),c(1,2,4,8)),c(1,1:2,1:4,1:8),sep="")
                    , x      = rep(0:3, c(1,2,4,8)) 
                    , y      = c(8, 12,4,14, 10,6,2, 15,13,11,9,7,5,3,1)
                    , labels = c("1462","864","598","695","169","71","527","400","295","22","147","33","38","118","409")
                    , col    = rep("lightblue", 15)
                    , stringsAsFactors= FALSE
                    )
# Create edges
edges <- data.frame(  N1 = paste(rep(c("O","C","R"), c(2,4,8)), rep(c(1,2,1,4:1)  , each=2), sep="")
                    , N2 = paste(rep(c("C","R","D"), c(2,4,8)), c(c(2:1,4:1,8:1)), sep="")
                    )

edges$Value   <- as.numeric(nodes$labels[2:15])
edges$col     <- NA
edges$col     <- rep(c("black","royalblue"), 7)
edges$edgecol <- "col"

# Create nodes/edges object
river <- makeRiver(nodes, edges)

# Define styles
style <-default.style()
style[["edgestyle"]]<-"straight"

# Plot
plot(river, default_style= style, srt=0, nsteps=200, nodewidth = 3)

# Add label
names <- data.frame (Time = c(" ", "during high school", "during university", "during practive")
                     ,hi  = c(0,0,0,0)
                     ,wi  = c(0,1,2,3)
                     )
with( names, text( wi, hi, Time) )

There is an alternative to plot a sequence of categorical information: 可以选择绘制一系列分类信息:
TraMineR - Mining sequence data TraMineR - 挖掘序列数据

TraMineR: a toolbox for exploring sequence data TraMineR:用于探索序列数据的工具箱
TraMineR is a R-package for mining, describing and visualizing sequences of states or events, and more generally discrete sequential data. TraMineR是一个R-package,用于挖掘,描述和可视化状态或事件序列,以及更一般的离散顺序数据。 Its primary aim is the analysis of biographical longitudinal data in the social sciences, such as data describing careers or family trajectories. 其主要目的是分析社会科学中的传记纵向数据,例如描述职业或家庭轨迹的数据。 Most of its features also apply, however, to non temporal data such as text or DNA sequences for instance 然而,其大多数特征也适用于非时间数据,例如文本或DNA序列

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM