[英]How to plot an exploratory decision tree in R
讓我們假設一群人在時間和3個時間點被跟蹤,他們被問到是否想成為法官。 在此期間,他們將改變他們的意見。 我希望以圖形方式顯示意見的變化,以便在時間內成為判斷/不判斷。 這是一個如何顯示它的想法:
以下是如何閱讀情節:
例如:
(a)118人不想在高中和大學期間成為法官,但在實踐中他們決定成為法官。
(b)直到練習695決定成為法官,但在練習后400成為法官,295做了其他事情。
主要思想是探索哪種決策路徑存在以及哪種決策路徑最常用。
我有幾個問題:
有什么建議么?
編輯1:
我發現了一個類似於上圖的情節:河圖,例如,參見R library riverplot或R blogger 。 河流圖的缺點是在交叉點處,各個線程或路徑都會丟失。
以下是數據:
library(reshape2)
library(ggplot2)
# Data
wide <- data.frame( grp = 1:8,
time1_orig = rep(8,8)
, time2_orig = rep(c(4,12), each = 4)
, time3_orig = rep(c(2,6,10,14), each = 2)
, time4_orig = seq(1,15,2)
, n = c(409,118,38,33,147,22,295,400) # number of persion
, d = c(1,0,1,0,1,0,1,0) # decision
)
wide
grp time1_orig time2_orig time3_orig time4_orig n d
1 1 8 4 2 1 409 1
2 2 8 4 2 3 118 0
3 3 8 4 6 5 38 1
4 4 8 4 6 7 33 0
5 5 8 12 10 9 147 1
6 6 8 12 10 11 22 0
7 7 8 12 14 13 295 1
8 8 8 12 14 15 400 0
接下來是數據轉換得到的情節:
w <- 500
wide$time1 <- wide$time1_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time2 <- wide$time2_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time3 <- wide$time3_orig + (cumsum(wide$n)-(wide$n)/2)/w
wide$time4 <- wide$time4_orig + (cumsum(wide$n)-(wide$n)/2)/w
long<- melt(wide[,-c(2:5)], id = c("d","grp","n"))
long$d<-as.character(long$d)
str(long)
這是ggplot:
gg1 <- ggplot(long, aes(x=variable, y=value, group=grp, colour=d)) +
geom_line (aes(size=n),position=position_dodge(height=c(0.5))) +
geom_text(aes(label=c( "1462","" ,"" ,"" ,"" ,"" ,"" ,""
,"" ,"" ,"598","" ,"" ,"864","" ,""
,"527" ,"" ,"" ,"71" ,"169","" ,"" ,"695"
,"409" ,"118","38" ,"33" ,"147","22" ,"295","400"
)
, size = 300, vjust= -1.5)
) +
scale_colour_manual(name="",labels=c("Yes", "No"),values=c("royalblue","black")) +
theme(legend.position = c(0,1),legend.justification = c(0, 1),
legend.text = element_text( size=12),
axis.text = element_text( size=12),
axis.title = element_text( size=15),
plot.title = element_text( size=15)) +
guides(lwd="none") +
labs(x="", y="Consider a judge career as an option:") +
scale_y_discrete(labels="") +
scale_x_discrete(labels = c( "during high school"
, "during university"
, "during practice"
, ""
)
)
gg1
我找到了一個解決方案,感謝圖書riverplot
,它給了我這個情節:
這是代碼:
library("riverplot")
# Create nodes
nodes <- data.frame( ID = paste(rep(c("O","C","R","D"),c(1,2,4,8)),c(1,1:2,1:4,1:8),sep="")
, x = rep(0:3, c(1,2,4,8))
, y = c(8, 12,4,14, 10,6,2, 15,13,11,9,7,5,3,1)
, labels = c("1462","864","598","695","169","71","527","400","295","22","147","33","38","118","409")
, col = rep("lightblue", 15)
, stringsAsFactors= FALSE
)
# Create edges
edges <- data.frame( N1 = paste(rep(c("O","C","R"), c(2,4,8)), rep(c(1,2,1,4:1) , each=2), sep="")
, N2 = paste(rep(c("C","R","D"), c(2,4,8)), c(c(2:1,4:1,8:1)), sep="")
)
edges$Value <- as.numeric(nodes$labels[2:15])
edges$col <- NA
edges$col <- rep(c("black","royalblue"), 7)
edges$edgecol <- "col"
# Create nodes/edges object
river <- makeRiver(nodes, edges)
# Define styles
style <-default.style()
style[["edgestyle"]]<-"straight"
# Plot
plot(river, default_style= style, srt=0, nsteps=200, nodewidth = 3)
# Add label
names <- data.frame (Time = c(" ", "during high school", "during university", "during practive")
,hi = c(0,0,0,0)
,wi = c(0,1,2,3)
)
with( names, text( wi, hi, Time) )
可以選擇繪制一系列分類信息:
TraMineR - 挖掘序列數據
TraMineR:用於探索序列數據的工具箱
TraMineR是一個R-package,用於挖掘,描述和可視化狀態或事件序列,以及更一般的離散順序數據。 其主要目的是分析社會科學中的傳記縱向數據,例如描述職業或家庭軌跡的數據。 然而,其大多數特征也適用於非時間數據,例如文本或DNA序列
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.