[英]Properties and their values out of J48 tree (RWeka)
If you run the following: 如果运行以下命令:
library(RWeka)
data(iris)
res = J48(Species ~., data = iris)
res
will be a list of class J48
inheriting from Weka_tree
. res
将是继承自Weka_tree
J48
类的列表。 If you print it 如果您打印
R> res
J48 pruned tree
------------------
Petal.Width <= 0.6: setosa (50.0)
Petal.Width > 0.6
| Petal.Width <= 1.7
| | Petal.Length <= 4.9: versicolor (48.0/1.0)
| | Petal.Length > 4.9
| | | Petal.Width <= 1.5: virginica (3.0)
| | | Petal.Width > 1.5: versicolor (3.0/1.0)
| Petal.Width > 1.7: virginica (46.0/1.0)
Number of Leaves : 5
Size of the tree : 9
I would like to get the properties and their values by their order from right to left. 我想按从右到左的顺序获取属性及其值。 So for this case: 因此,在这种情况下:
Petal.Width, Petal.Width, Petal.Length, Petal.Length.
I tried to enter res to a factor and to run the command: 我试图将res输入一个因子并运行命令:
str_extract(paste0(x, collapse=""), perl("(?<=\\|)[A-Za-z]+(?=\\|)"))
with no success. 没有成功。 Just to remember that we should ignore the left around characters. 只是要记住,我们应该忽略左边的字符。
One way to do this is to convert the J48
object from RWeka
to a party
object from partykit
. 这样做的一个方法是将转换J48
对象从RWeka
到party
从对象partykit
。 You just need to as as.party(res)
and this does all the parsing for you and returns a structure that is easier to work with with standardized extractor functions etc. 您只需要as.party(res)
,这将为您完成所有解析,并返回一个更易于与标准化提取器功能等配合使用的结构。
In particular you can then use all advice given in other discussions about ctree
objects etc. See 特别是,然后您可以使用其他讨论中有关ctree
对象等的所有建议。请参阅
How to extract the splitting rules for the terminal nodes of ctree() 如何为ctree()的终端节点提取拆分规则
Get decision tree rule/path pattern for every row of predicted dataset for rpart/ctree package in R 为R中的rpart / ctree包获取预测数据集的每一行的决策树规则/路径模式
Identify all distinct variables within party ctree nodel 识别参与方ctree nodel中的所有不同变量
And I think the following should do at least part of what you want: 而且我认为以下内容至少应满足您的需求:
library("partykit")
pres <- as.party(res)
partykit:::.list.rules.party(pres)
## 2
## "Petal.Width <= 0.6"
## 5
## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length <= 4.9"
## 7
## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width <= 1.5"
## 8
## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width > 1.5"
## 9
## "Petal.Width > 0.6 & Petal.Width > 1.7"
Update : The OP contacted me off-list for a related question, asking for a specific printed representation of the tree. 更新 :OP向我从列表中联系了一个相关问题,要求提供树的特定打印表示形式。 I'm including my solution here in case it is useful for someone else. 我在这里包括我的解决方案,以防它对其他人有用。
He wanted to have ( ) symbols signalling the hierarchy levels plus the names of the splitting variables. 他想用()符号表示层次结构级别以及拆分变量的名称。 One way to do so would be to (1) extract variable names of the underlying data: 一种方法是(1)提取基础数据的变量名:
nam <- names(pres$data)
(2) Turn the recursive node structure of the tree into a flat list (which is somewhat more convenient for constructing the desired string): (2)将树的递归节点结构转换为平面列表(这对于构造所需的字符串更方便):
tr <- as.list(pres$node)
(3a) Initialize the string: (3a)初始化字符串:
str <- "("
(3b) Recursively add brackets and/or variable names to the string: (3b)递归在字符串中添加方括号和/或变量名:
update_str <- function(x) {
if(is.null(x$kids)) {
str <<- paste(str, ")")
} else {
str <<- paste(str, nam[x$split$varid], "(")
for(i in x$kids) update_str(tr[[i]])
}
}
(3c) Call the recursion, starting from the root node: (3c)从根节点开始调用递归:
update_str(tr[[1]])
str
## [1] "( Petal.Width ( ) Petal.Width ( Petal.Length ( ) Petal.Width ( ) ) )"
I hope I'm not missing your point here, but I assume you want to create and store, somehow, the rules based on the terminal nodes of your tree model. 希望我不会在这里遗漏您的意思,但是我假设您想以某种方式创建和存储基于树模型的终端节点的规则。 Personally, I've found that the model tree building packages (RWeka, party, partykit, rpart) lack of enabling the user to create a useful list of rules after the model is built. 就个人而言,我发现模型树构建包(RWeka,party,partykit,rpart)缺少使用户能够在构建模型后创建有用的规则列表的功能。 Of course, when you have few variables and splits you can interpret the tree plot. 当然,当变量和拆分很少时,您可以解释树图。
The only easy and robust way I've found so far (and I use myself) is the command "path.rpart" of the rpart package. 到目前为止,我发现的唯一简单而可靠的方法(我使用了我自己)是rpart软件包的命令“ path.rpart”。 If you really want to use RWeka the solution will seem irrelevant, but I'll give it a try: 如果您真的想使用RWeka,则该解决方案似乎无关紧要,但是我将尝试一下:
library(rpart)
res = rpart(Species ~., data = iris)
res
# n= 150
#
# node), split, n, loss, yval, (yprob)
# * denotes terminal node
#
# 1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)
# 2) Petal.Length< 2.45 50 0 setosa (1.00000000 0.00000000 0.00000000) *
# 3) Petal.Length>=2.45 100 50 versicolor (0.00000000 0.50000000 0.50000000)
# 6) Petal.Width< 1.75 54 5 versicolor (0.00000000 0.90740741 0.09259259) *
# 7) Petal.Width>=1.75 46 1 virginica (0.00000000 0.02173913 0.97826087) *
# capture terminal nodes
terminal_nodes = rownames(res$frame)[res$frame$var =="<leaf>"]
# print rules for the terminal nodes
path.rpart(res ,nodes=terminal_nodes)
# node number: 2
# root
# Petal.Length< 2.45
#
# node number: 6
# root
# Petal.Length>=2.45
# Petal.Width< 1.75
#
# node number: 7
# root
# Petal.Length>=2.45
# Petal.Width>=1.75
# print above rules as list
rules = path.rpart(res ,nodes=terminal_nodes)
listed_rules = unlist(rules)
sapply(rules,"[",-1)
# $`2`
# [1] "Petal.Length< 2.45"
#
# $`6`
# [1] "Petal.Length>=2.45" "Petal.Width< 1.75"
#
# $`7`
# [1] "Petal.Length>=2.45" "Petal.Width>=1.75"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.