简体   繁体   English

用R绘制决策树

[英]Plot a decision tree with R

I have a 440*2 matrix that looks like: 我有一个440 * 2矩阵,看起来像:

1   144
1   152
1   135
2   3
2   12
2   107
2   31
3   4
3   147
3   0
4   end
4   0
4   0
5   6
5   7
5   10
5   9

The left column are the starting points eg in the app all the 1's on the left would be on the same page. 左列是起点,例如,在应用程序中,左侧的所有1都位于同一页面上。 They lead to three choices, pages 144,152,135. 他们导致三个选择,第144,152,135页。 These pages can each then lead to another page, and so on until the right hand column says 'end'. 然后,这些页面可以各自通向另一页面,依此类推,直到右列显示“结束”为止。 What I would like is a way to visualise the scale of this tree. 我想要的是一种可视化这棵树的比例的方法。 I realise it will be quite large given the nb of rows so maybe not graph friendly, so for clarity I want to know how many possible routes there are in total (from every start point, down every option it gives and the end destinations of each. I realise there will be overlaps but thats why I am finding this hard to calculate). 我意识到在行数为nb的情况下它会很大,因此可能不是图形友好的,所以为了清楚起见,我想知道总共有多少条可能的路线(从每个起点开始,一直到给出的每个选项,以及每个终点的终点我意识到会有重叠,但这就是为什么我发现这个很难计算)。

secondly, each number has an associated title. 其次,每个数字都有一个相关的标题。 I would like to have a function whereby if you input a given title it will plot all the possible starting points and their associated paths that will lead there. 我希望有一个函数,如果您输入给定的标题,它将绘制所有可能的起点及其相关的路径,并以此为起点。 This should be a lot smaller and therefore graph friendly. 这应该小很多,因此图形友好。

eg 例如

dta <- "
14  12  as
186  187  Frac
187  154  Low
23   52   Med
52   11   Lip
15  55  asd
11   42   AAA
42   154   BBB
154   end  Coll"

Edited example data to show that some branches are not connected to desired tree 编辑的示例数据显示某些分支未连接到所需树

  dta <- "
  14  12  as
  186  187  Frac
  187  154  Low
  23   52   Med
  52   11   Lip
  11   42   AAA
  42   154   BBB
  154   end  Coll"

 dta <- gsub("   ", ",", dta, fixed = TRUE)   
 dta <- gsub("  ", ",", dta, fixed = TRUE)

df <- read.csv(textConnection(dta), stringsAsFactors = FALSE, header = FALSE)
names(df) <- c("from", "to", "nme")
library(data.tree)
Warning message:
package ‘data.tree’ was built under R version 3.2.5
tree <- FromDataFrameNetwork(df)
 **Error in FromDataFrameNetwork(df) :**
  **Cannot find root name. network is not a tree!**

I made this example to show how column 1 leads to a value in column 2 which then refers to a value in column 1 until you reach the end. 我制作了这个示例,以显示第1列如何导致第2列中的值,该值然后引用第1列中的值直到到达结尾。 Different starting points can ultimately lead to different length paths to same destination. 不同的起点最终可能导致到达同一目的地的长度不同的路径。 so this would look sometigng like: 所以看起来像: 所有路径通向Coll

So here, I wanted to see how you could go from all start points to 'Coll' 所以在这里,我想看看如何从所有起点转到“ Coll”

greatly appreciate any help 非常感谢任何帮助

If you have indeed a tree (eg no cycles), you can use data.tree: 如果确实有树(例如没有循环),则可以使用data.tree:

Start by converting to a data.frame: 首先转换为data.frame:

 dta <- "
14  12  as
186  187  Frac
187  154  Low
23   52   Med
52   11   Lip
15  55  asd
11   42   AAA
42   154   BBB
154   end  Coll
55  end  efg
12  end  hij"

dta <- gsub("   ", ",", dta, fixed = TRUE)
dta <- gsub("  ", ",", dta, fixed = TRUE)


df <- read.csv(textConnection(dta), stringsAsFactors = FALSE, header = FALSE)
names(df) <- c("from", "to", "nme")

Now, convert to a data.tree: 现在,转换为data.tree:

library(data.tree)
tree <- FromDataFrameNetwork(df)

tree$leafCount

You can now navigate to any sub-tree, for analysis and plotting. 现在,您可以导航到任何子树,以进行分析和绘图。 Eg using any of the following possibilities: 例如,使用以下任意一种可能性:

subTree <- tree$FindNode(187)
subTree <- Climb(tree, nme = "Coll", nme = "Low")
subTree <- tree$`154`$`187`

subTree <- Clone(tree$`154`)

Maybe printing is all you need: 也许您只需要打印:

print(subTree , "nme")

This will print like so: 它将像这样打印:

  levelName          nme
1 154                Coll
2  ¦--187             Low
3  ¦   °--186        Frac
4  °--42              BBB
5      °--11          AAA
6          °--52      Lip
7              °--23  Med

Otherwise, use fancy plotting: 否则,请使用花式绘图:

SetNodeStyle(subTree , style = "filled,rounded", shape = "box", fontname = "helvetica", label = function(node) node$nme, tooltip = "name")
plot(subTree , direction = "descend")

This looks like this: 看起来像这样:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM