简体   繁体   中英

Plot a decision tree with R

I have a 440*2 matrix that looks like:

1   144
1   152
1   135
2   3
2   12
2   107
2   31
3   4
3   147
3   0
4   end
4   0
4   0
5   6
5   7
5   10
5   9

The left column are the starting points eg in the app all the 1's on the left would be on the same page. They lead to three choices, pages 144,152,135. These pages can each then lead to another page, and so on until the right hand column says 'end'. What I would like is a way to visualise the scale of this tree. I realise it will be quite large given the nb of rows so maybe not graph friendly, so for clarity I want to know how many possible routes there are in total (from every start point, down every option it gives and the end destinations of each. I realise there will be overlaps but thats why I am finding this hard to calculate).

secondly, each number has an associated title. I would like to have a function whereby if you input a given title it will plot all the possible starting points and their associated paths that will lead there. This should be a lot smaller and therefore graph friendly.

eg

dta <- "
14  12  as
186  187  Frac
187  154  Low
23   52   Med
52   11   Lip
15  55  asd
11   42   AAA
42   154   BBB
154   end  Coll"

Edited example data to show that some branches are not connected to desired tree

  dta <- "
  14  12  as
  186  187  Frac
  187  154  Low
  23   52   Med
  52   11   Lip
  11   42   AAA
  42   154   BBB
  154   end  Coll"

 dta <- gsub("   ", ",", dta, fixed = TRUE)   
 dta <- gsub("  ", ",", dta, fixed = TRUE)

df <- read.csv(textConnection(dta), stringsAsFactors = FALSE, header = FALSE)
names(df) <- c("from", "to", "nme")
library(data.tree)
Warning message:
package ‘data.tree’ was built under R version 3.2.5
tree <- FromDataFrameNetwork(df)
 **Error in FromDataFrameNetwork(df) :**
  **Cannot find root name. network is not a tree!**

I made this example to show how column 1 leads to a value in column 2 which then refers to a value in column 1 until you reach the end. Different starting points can ultimately lead to different length paths to same destination. so this would look sometigng like: 所有路径通向Coll

So here, I wanted to see how you could go from all start points to 'Coll'

greatly appreciate any help

If you have indeed a tree (eg no cycles), you can use data.tree:

Start by converting to a data.frame:

 dta <- "
14  12  as
186  187  Frac
187  154  Low
23   52   Med
52   11   Lip
15  55  asd
11   42   AAA
42   154   BBB
154   end  Coll
55  end  efg
12  end  hij"

dta <- gsub("   ", ",", dta, fixed = TRUE)
dta <- gsub("  ", ",", dta, fixed = TRUE)


df <- read.csv(textConnection(dta), stringsAsFactors = FALSE, header = FALSE)
names(df) <- c("from", "to", "nme")

Now, convert to a data.tree:

library(data.tree)
tree <- FromDataFrameNetwork(df)

tree$leafCount

You can now navigate to any sub-tree, for analysis and plotting. Eg using any of the following possibilities:

subTree <- tree$FindNode(187)
subTree <- Climb(tree, nme = "Coll", nme = "Low")
subTree <- tree$`154`$`187`

subTree <- Clone(tree$`154`)

Maybe printing is all you need:

print(subTree , "nme")

This will print like so:

  levelName          nme
1 154                Coll
2  ¦--187             Low
3  ¦   °--186        Frac
4  °--42              BBB
5      °--11          AAA
6          °--52      Lip
7              °--23  Med

Otherwise, use fancy plotting:

SetNodeStyle(subTree , style = "filled,rounded", shape = "box", fontname = "helvetica", label = function(node) node$nme, tooltip = "name")
plot(subTree , direction = "descend")

This looks like this:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM