[英]Plot a decision tree with R
I have a 440*2 matrix that looks like: 我有一个440 * 2矩阵,看起来像:
1 144
1 152
1 135
2 3
2 12
2 107
2 31
3 4
3 147
3 0
4 end
4 0
4 0
5 6
5 7
5 10
5 9
The left column are the starting points eg in the app all the 1's on the left would be on the same page. 左列是起点,例如,在应用程序中,左侧的所有1都位于同一页面上。 They lead to three choices, pages 144,152,135.
他们导致三个选择,第144,152,135页。 These pages can each then lead to another page, and so on until the right hand column says 'end'.
然后,这些页面可以各自通向另一页面,依此类推,直到右列显示“结束”为止。 What I would like is a way to visualise the scale of this tree.
我想要的是一种可视化这棵树的比例的方法。 I realise it will be quite large given the nb of rows so maybe not graph friendly, so for clarity I want to know how many possible routes there are in total (from every start point, down every option it gives and the end destinations of each. I realise there will be overlaps but thats why I am finding this hard to calculate).
我意识到在行数为nb的情况下它会很大,因此可能不是图形友好的,所以为了清楚起见,我想知道总共有多少条可能的路线(从每个起点开始,一直到给出的每个选项,以及每个终点的终点我意识到会有重叠,但这就是为什么我发现这个很难计算)。
secondly, each number has an associated title. 其次,每个数字都有一个相关的标题。 I would like to have a function whereby if you input a given title it will plot all the possible starting points and their associated paths that will lead there.
我希望有一个函数,如果您输入给定的标题,它将绘制所有可能的起点及其相关的路径,并以此为起点。 This should be a lot smaller and therefore graph friendly.
这应该小很多,因此图形友好。
eg 例如
dta <- "
14 12 as
186 187 Frac
187 154 Low
23 52 Med
52 11 Lip
15 55 asd
11 42 AAA
42 154 BBB
154 end Coll"
Edited example data to show that some branches are not connected to desired tree 编辑的示例数据显示某些分支未连接到所需树
dta <- "
14 12 as
186 187 Frac
187 154 Low
23 52 Med
52 11 Lip
11 42 AAA
42 154 BBB
154 end Coll"
dta <- gsub(" ", ",", dta, fixed = TRUE)
dta <- gsub(" ", ",", dta, fixed = TRUE)
df <- read.csv(textConnection(dta), stringsAsFactors = FALSE, header = FALSE)
names(df) <- c("from", "to", "nme")
library(data.tree)
Warning message:
package ‘data.tree’ was built under R version 3.2.5
tree <- FromDataFrameNetwork(df)
**Error in FromDataFrameNetwork(df) :**
**Cannot find root name. network is not a tree!**
I made this example to show how column 1 leads to a value in column 2 which then refers to a value in column 1 until you reach the end. 我制作了这个示例,以显示第1列如何导致第2列中的值,该值然后引用第1列中的值直到到达结尾。 Different starting points can ultimately lead to different length paths to same destination.
不同的起点最终可能导致到达同一目的地的长度不同的路径。 so this would look sometigng like:
所以看起来像:
So here, I wanted to see how you could go from all start points to 'Coll' 所以在这里,我想看看如何从所有起点转到“ Coll”
greatly appreciate any help 非常感谢任何帮助
If you have indeed a tree (eg no cycles), you can use data.tree: 如果确实有树(例如没有循环),则可以使用data.tree:
Start by converting to a data.frame: 首先转换为data.frame:
dta <- "
14 12 as
186 187 Frac
187 154 Low
23 52 Med
52 11 Lip
15 55 asd
11 42 AAA
42 154 BBB
154 end Coll
55 end efg
12 end hij"
dta <- gsub(" ", ",", dta, fixed = TRUE)
dta <- gsub(" ", ",", dta, fixed = TRUE)
df <- read.csv(textConnection(dta), stringsAsFactors = FALSE, header = FALSE)
names(df) <- c("from", "to", "nme")
Now, convert to a data.tree: 现在,转换为data.tree:
library(data.tree)
tree <- FromDataFrameNetwork(df)
tree$leafCount
You can now navigate to any sub-tree, for analysis and plotting. 现在,您可以导航到任何子树,以进行分析和绘图。 Eg using any of the following possibilities:
例如,使用以下任意一种可能性:
subTree <- tree$FindNode(187)
subTree <- Climb(tree, nme = "Coll", nme = "Low")
subTree <- tree$`154`$`187`
subTree <- Clone(tree$`154`)
Maybe printing is all you need: 也许您只需要打印:
print(subTree , "nme")
This will print like so: 它将像这样打印:
levelName nme
1 154 Coll
2 ¦--187 Low
3 ¦ °--186 Frac
4 °--42 BBB
5 °--11 AAA
6 °--52 Lip
7 °--23 Med
Otherwise, use fancy plotting: 否则,请使用花式绘图:
SetNodeStyle(subTree , style = "filled,rounded", shape = "box", fontname = "helvetica", label = function(node) node$nme, tooltip = "name")
plot(subTree , direction = "descend")
This looks like this: 看起来像这样:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.