简体   繁体   English

R中的闭包,函数内的调用函数,递归函数

[英]Closures in R, calling functions within a function , recursive functions

I am new to R and I am trying out a Classification decision tree using party:ctree library. 我是R的新手,正在使用party:ctree库尝试分类决策树。 All seems to be fine. 一切似乎都很好。 I get the expected result and a well describing plot. 我得到了预期的结果并很好地描述了情节。

Now if i want to extract the results from the summary of the fit, I ahve to traverse to each node and extract information. 现在,如果我想从拟合摘要中提取结果,我将遍历每个节点并提取信息。 Fortunately this is already written by @baydoganm here . 幸运的是, 这里已经由@baydoganm编写了。 I want to extend this code and write the results to a dataframe instead of printing it. 我想扩展此代码并将结果写入dataframe而不是打印它。

reproducible code : 可复制的代码:

library(party)
 ct <- ctree(Species ~ ., data = iris)

   traverse <- function(treenode){
        if(treenode$terminal){
           bas=paste(treenode$nodeID,treenode$prediction)
         print(bas) #here the results are printed
         return(0)
                } 

 traverse(treenode$left)
 traverse(treenode$right)
  }

 traverse(ct@tree) #function call

This works fine and i get the output on console. 这工作正常,我在控制台上得到输出。 Now if i want to write the results to a data frame, I am facing problems. 现在,如果我想将结果写入数据帧,则面临问题。

What i tried so far: tried to write to a list using mutable closures(). 到目前为止,我尝试过的事情:尝试使用可变的Closures()写入列表。 But not sure how to get it working. 但是不确定如何使它工作。

l <- list()
count = 0
traverse1 <- function(treenode,l){

if((treenode$terminal == T)){
    count <<- count + 1
    print(count)
    node = c(treenode$nodeID)
    pred = c(treenode$prediction)
    l[[count]] <- data.frame(node,pred) #write results in the dataframe    
  } 

  traverse1(treenode$left,l)
  traverse1(treenode$right,l)

}
test <- traverse1(ct@tree,l)# function call

I get only the results of my last call to the function and rest are null 我只得到最后一次调用该函数的结果,其余均为空

Smart way: use assign() to write in the global environment: 聪明的方法:使用assign()在全局环境中编写:

require(party) 
ct <- ctree(Species ~ ., data = iris)

tt <- NULL

traverse <- function(treenode){
  if(treenode$terminal){
    bas=paste(treenode$nodeID,treenode$prediction)
    assign("tt", c(tt, bas), envir = .GlobalEnv)
    print(bas) #here the results are printed
    return(0)
  } 

  traverse(treenode$left)
  traverse(treenode$right)
}

traverse(ct@tree) #function call

data.frame(node.id = unlist(lapply(str_split(tt, " "), function(x) x[[1]]))
       , prediction = unlist(lapply(str_split(tt, " "), function(x) x[[2]])))

Dirty way: use sink() to save your printed output. 肮脏的方式:使用sink()保存您的打印输出。

sink(file = "test.csv", append = T)
traverse(ct@tree) #function call
sink()

tt <- read.csv("test.csv", header = F)

If you use the new improved ctree() implementation from the partykit package, then this has all information you need in its fitted component: 如果您使用来自partykit包的新改进的ctree()实现,那么在其fitted组件中将包含所有您需要的信息:

library("partykit")
ct <- ctree(Species ~ ., data = iris)
head(fitted(ct))
##   (fitted) (weights) (response)
## 1        2         1     setosa
## 2        2         1     setosa
## 3        2         1     setosa
## 4        2         1     setosa
## 5        2         1     setosa
## 6        2         1     setosa

So for a classification tree you can easily construct the table of absolute frequencies of the response using xtabs() (or table() ). 因此,对于分类树,您可以使用xtabs() (或table() )轻松构造响应的绝对频率table() And for a regression tree, tapply() could easily be used to get means, medians, etc. 对于回归树, tapply()可以轻松用于获取均值,中位数等。

In this case let's look at absolute and relative frequencies in tabular form: 在这种情况下,让我们以表格形式查看绝对和相对频率:

tab <- xtabs(~ `(fitted)` + `(response)`, data = fitted(ct))
tab
##         (response)
## (fitted) setosa versicolor virginica
##        2     50          0         0
##        5      0         45         1
##        6      0          4         4
##        7      0          1        45
ptab <- prop.table(tab, 1)
ptab
##         (response)
## (fitted)     setosa versicolor  virginica
##        2 1.00000000 0.00000000 0.00000000
##        5 0.00000000 0.97826087 0.02173913
##        6 0.00000000 0.50000000 0.50000000
##        7 0.00000000 0.02173913 0.97826087

An alternative route to obtain the frequency table tab would be: table(predict(ct, type = "node"), iris$Species) . 获取频率表tab的另一种方法是: table(predict(ct, type = "node"), iris$Species)

If you want to turn any of these into a data frame the as.data.frame() works just fine (probably plus some relabeling of the variables...): 如果要将其中任何一个转换为数据框,则as.data.frame()可以正常工作(可能加上一些变量的重新标记...):

as.data.frame(ptab)
##    X.fitted. X.response.       Freq
## 1          2      setosa 1.00000000
## 2          5      setosa 0.00000000
## 3          6      setosa 0.00000000
## 4          7      setosa 0.00000000
## 5          2  versicolor 0.00000000
## 6          5  versicolor 0.97826087
## 7          6  versicolor 0.50000000
## 8          7  versicolor 0.02173913
## 9          2   virginica 0.00000000
## 10         5   virginica 0.02173913
## 11         6   virginica 0.50000000
## 12         7   virginica 0.97826087

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM