简体   繁体   English

J48树(RWeka)中的属性及其值

[英]Properties and their values out of J48 tree (RWeka)

If you run the following: 如果运行以下命令:

library(RWeka) 
data(iris) 
res = J48(Species ~., data = iris)

res will be a list of class J48 inheriting from Weka_tree . res将是继承自Weka_tree J48类的列表。 If you print it 如果您打印

R> res
J48 pruned tree
------------------

Petal.Width <= 0.6: setosa (50.0)
Petal.Width > 0.6
|   Petal.Width <= 1.7
|   |   Petal.Length <= 4.9: versicolor (48.0/1.0)
|   |   Petal.Length > 4.9
|   |   |   Petal.Width <= 1.5: virginica (3.0)
|   |   |   Petal.Width > 1.5: versicolor (3.0/1.0)
|   Petal.Width > 1.7: virginica (46.0/1.0)

Number of Leaves  :     5

Size of the tree :  9

I would like to get the properties and their values by their order from right to left. 我想按从右到左的顺序获取属性及其值。 So for this case: 因此,在这种情况下:

Petal.Width, Petal.Width, Petal.Length, Petal.Length.

I tried to enter res to a factor and to run the command: 我试图将res输入一个因子并运行命令:

str_extract(paste0(x, collapse=""), perl("(?<=\\|)[A-Za-z]+(?=\\|)"))

with no success. 没有成功。 Just to remember that we should ignore the left around characters. 只是要记住,我们应该忽略左边的字符。

One way to do this is to convert the J48 object from RWeka to a party object from partykit . 这样做的一个方法是将转换J48对象从RWekaparty从对象partykit You just need to as as.party(res) and this does all the parsing for you and returns a structure that is easier to work with with standardized extractor functions etc. 您只需要as.party(res) ,这将为您完成所有解析,并返回一个更易于与标准化提取器功能等配合使用的结构。

In particular you can then use all advice given in other discussions about ctree objects etc. See 特别是,然后您可以使用其他讨论中有关ctree对象等的所有建议。请参阅

And I think the following should do at least part of what you want: 而且我认为以下内容至少应满足您的需求:

library("partykit")
pres <- as.party(res)
partykit:::.list.rules.party(pres)
##                                                                                  2 
##                                                               "Petal.Width <= 0.6" 
##                                                                                  5 
##                     "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length <= 4.9" 
##                                                                                  7 
## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width <= 1.5" 
##                                                                                  8 
##  "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width > 1.5" 
##                                                                                  9 
##                                            "Petal.Width > 0.6 & Petal.Width > 1.7" 

Update : The OP contacted me off-list for a related question, asking for a specific printed representation of the tree. 更新 :OP向我从列表中联系了一个相关问题,要求提供树的特定打印表示形式。 I'm including my solution here in case it is useful for someone else. 我在这里包括我的解决方案,以防它对其他人有用。

He wanted to have ( ) symbols signalling the hierarchy levels plus the names of the splitting variables. 他想用()符号表示层次结构级别以及拆分变量的名称。 One way to do so would be to (1) extract variable names of the underlying data: 一种方法是(1)提取基础数据的变量名:

nam <- names(pres$data)

(2) Turn the recursive node structure of the tree into a flat list (which is somewhat more convenient for constructing the desired string): (2)将树的递归节点结构转换为平面列表(这对于构造所需的字符串更方便):

tr <- as.list(pres$node)

(3a) Initialize the string: (3a)初始化字符串:

str <- "("

(3b) Recursively add brackets and/or variable names to the string: (3b)递归在字符串中添加方括号和/或变量名:

update_str <- function(x) {
   if(is.null(x$kids)) {
     str <<- paste(str, ")")
   } else {
     str <<- paste(str, nam[x$split$varid], "(")
     for(i in x$kids) update_str(tr[[i]])
   }
}

(3c) Call the recursion, starting from the root node: (3c)从根节点开始调用递归:

update_str(tr[[1]])
str
## [1] "( Petal.Width ( ) Petal.Width ( Petal.Length ( ) Petal.Width ( ) ) )"

I hope I'm not missing your point here, but I assume you want to create and store, somehow, the rules based on the terminal nodes of your tree model. 希望我不会在这里遗漏您的意思,但是我假设您想以某种方式创建和存储基于树模型的终端节点的规则。 Personally, I've found that the model tree building packages (RWeka, party, partykit, rpart) lack of enabling the user to create a useful list of rules after the model is built. 就个人而言,我发现模型树构建包(RWeka,party,partykit,rpart)缺少使用户能够在构建模型后创建有用的规则列表的功能。 Of course, when you have few variables and splits you can interpret the tree plot. 当然,当变量和拆分很少时,您可以解释树图。

The only easy and robust way I've found so far (and I use myself) is the command "path.rpart" of the rpart package. 到目前为止,我发现的唯一简单而可靠的方法(我使用了我自己)是rpart软件包的命令“ path.rpart”。 If you really want to use RWeka the solution will seem irrelevant, but I'll give it a try: 如果您真的想使用RWeka,则该解决方案似乎无关紧要,但是我将尝试一下:

library(rpart)

res = rpart(Species ~., data = iris)

res

# n= 150 
# 
# node), split, n, loss, yval, (yprob)
# * denotes terminal node
# 
# 1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)  
# 2) Petal.Length< 2.45 50   0 setosa (1.00000000 0.00000000 0.00000000) *
#   3) Petal.Length>=2.45 100  50 versicolor (0.00000000 0.50000000 0.50000000)  
# 6) Petal.Width< 1.75 54   5 versicolor (0.00000000 0.90740741 0.09259259) *
#   7) Petal.Width>=1.75 46   1 virginica (0.00000000 0.02173913 0.97826087) *


# capture terminal nodes
terminal_nodes = rownames(res$frame)[res$frame$var =="<leaf>"]

# print rules for the terminal nodes
path.rpart(res ,nodes=terminal_nodes)

# node number: 2 
# root
# Petal.Length< 2.45
# 
# node number: 6 
# root
# Petal.Length>=2.45
# Petal.Width< 1.75
# 
# node number: 7 
# root
# Petal.Length>=2.45
# Petal.Width>=1.75


# print above rules as list
rules = path.rpart(res ,nodes=terminal_nodes)
listed_rules = unlist(rules)
sapply(rules,"[",-1)

# $`2`
# [1] "Petal.Length< 2.45"
# 
# $`6`
# [1] "Petal.Length>=2.45" "Petal.Width< 1.75" 
# 
# $`7`
# [1] "Petal.Length>=2.45" "Petal.Width>=1.75" 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM