简体   繁体   English

在R中的随机森林树的每个最终节点处提取类分布

[英]Extracting class distribution at each final node of trees in random forest in R

I'm using randomForest package in R. From what I've understood,this package only gives me the assigned class to each instance at the final nodes of each tree, but I need to know the class distribution at each node. 我正在使用R中的randomForest包。从我所理解的,这个包只给了我在每个树的最后节点的每个实例的指定类,但我需要知道每个节点的类分布。

Let's say at a final node of one tree we have 10 instances of class 0, and 20 instances of class 1 when the whole forest is trained. 假设在一棵树的最后一个节点上,我们有10个0级实例,并且在训练整个森林时有20个1级实例。 Now instead of saying the assigned class for this node is 1 (because of majority of instances from class 1) I want to know the class counts (10 and 20). 现在不是说这个节点的指定类是1(因为来自类1的大多数实例),我想知道类计数(10和20)。 is there any way to do so? 有没有办法这样做? Thanks for your help in advance. 感谢您的帮助。

You can use function predict.randomForest(.., type = "prob") to get the predicted probabilities. 您可以使用函数predict.randomForest(.., type = "prob")来获得预测的概率。 However, they are calculated by aggregating the predictions (but not the predicted probabilities!) of individual decision trees. 但是,它们是通过聚合各个决策树的预测(但不是预测的概率!)来计算的。 If you have 10 trees predicting class=1 and 30 trees predicting class=0 , then this function call would yield the predicted probability of the first class as 0.25. 如果你有10棵树预测class=1和30棵树预测class=0 ,那么这个函数调用会产生第一类的预测概率为0.25。

If you require "true" tree-level probabilities, then you must switch to a different RF algorithm. 如果您需要“真正的”树级概率,则必须切换到不同的RF算法。 For example, Scikit-Learn's class RandomForestClassifier works this way. 例如,Scikit-Learn的类RandomForestClassifier以这种方式工作。

Good question! 好问题!

It would only be an issue for RF classification if you do not grow the trees fully. 如果您没有完全种植树木,那么这只是RF分类的一个问题。 To prevent fully grown trees you would have to set minnodes>1 than 1 and/or maxnodes< N.samples. 要防止完全成长的树,您必须设置minnodes> 1而不是1和/或maxnodes <N.samples。 The randmForest implementation only stores the classification node prediction by its majority vote prediction, see getTree(rf) randmForest实现仅通过其多数投票预测存储分类节点预测,请参阅getTree(rf)

I had the same issue writing the forestFloor package visualizing feature contributions . 我在编写forestFloor包可视化功能贡献时遇到了同样的问题。 I had to recalculate all node states of the trees with a recursive Rcpp function. 我不得不用递归Rcpp函数重新计算树的所有节点状态。 I think you have to do the same or fix the source code of the package. 我认为你必须做同样的事情或修复包的源代码。 You could also ask Liaw, the maintainer of randomForest, to implement it. 你也可以问一下randomForest的维护者Liaw来实现它。 Or ask me to implement an output of the computed node states. 或者让我实现计算节点状态的输出。 A small chance one of the other random forest implementations already do support more detailed node states. 其他随机林实现中的一个很小的机会已经支持更详细的节点状态。

randomForest only output/stores majority vote of terminal notes randomForest只输出/存储终端票据的多数票

library(randomForest)
set.seed(123)
obs=2000
X = matrix(rnorm(obs))
y = factor((X+rnorm(obs))>=0)

plot(X,col=y)

rf = randomForest(X,y,
                  keep.inbag=T,
                  nodesize = 15,
                  ntree=2)

#but notice prob predictions only can be 100%, 50% or 0%
print(head(predict(rf,X,type="prob"),15)) #(NB these predictions are not OOB-CV!)

   FALSE TRUE
1    1.0  0.0
2    0.0  1.0
3    0.0  1.0
4    0.0  1.0
5    0.5  0.5
6    0.0  1.0
7    0.0  1.0
8    1.0  0.0
9    1.0  0.0
10   1.0  0.0
11   0.0  1.0
12   0.0  1.0
13   0.0  1.0
14   0.0  1.0
15   1.0  0.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R中随机森林的类别重要性 - class importance for random forest in r 如何在 R 中向随机森林添加更多树 - How can I add more trees to a random forest in R R:Tidymodels:是否可以在整洁的模型中使用 plot 随机森林 model 的树木? - R: Tidymodels: Is it possible to plot the trees for a random forest model in tidy models? 在 R 中访问随机森林中每个元素的重要性 - Accessing Importance of each element in Random Forest in R Tidymodel 包:R 中的通用线性模型 (glm) 和决策树(袋装树、提升树和随机森林)模型 - Tidymodel Package: General linear models (glm) and decision tree (bagged trees, boosted trees, and random forest) models in R R中随机森林中的二元分类或未知类 - Binary classification or unknown class in random forest in R 我们可以将使用SparkR构建的随机森林模型导入R,然后使用getTree提取其中一棵树吗? - Can we import the random forest model built using SparkR to R and then use getTree to extract one of the trees? 随着树木数量的增加,随机森林变得更糟 - random forest gets worse as number of trees increases R:可以控制树的最大深度的任何随机森林包吗? - R: any random forest packages in which the maximum depth of trees can be controlled? R中的随机森林交叉验证 - Random Forest Crossvalidation in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM