简体   繁体   English

Scikit学习随机森林分类器:如何根据树的数量生成OOB错误图

[英]Scikit Learn Random forest classifier: How to produce a plot of OOB error against number of trees


Edit 2 : There is now a lovely example in the sklearn documentation on this. 编辑2sklearn文档中现在有一个可爱的示例。


In order to see how many trees are necessary in my forest, I'd like to plot the OOB error as the number of trees used in the forest is increased. 为了查看我的森林中需要多少棵树,随着森林中使用的树木数量的增加,我想绘制OOB错误。 I'm in Python using a sklearn.ensemble.RandomForestClassifier but I can't find how to predict using a subset of trees in the forest. 我在Python使用sklearn.ensemble.RandomForestClassifier但我找不到如何使用森林中的树子集进行预测的方法。 I could do this by making a new random forest on each iteration with increasing numbers of trees but this is too expensive. 我可以通过在每次迭代中创建一个新的随机森林来增加树的数量来做到这一点,但这太昂贵了。

It seems a similar task is possible with the Gradient Boosting object using the staged_decision_function method. 使用staged_decision_function方法,使用“梯度增强”对象似乎可以完成类似的任务。 See this example . 请参阅此示例

This is quite a simple procedure in R and can be achieved by simply calling plot(randomForestObject) : 这在R是一个非常简单的过程,可以通过简单地调用plot(randomForestObject)来实现: 针对树木的随机森林OOB错误


-- Edit -- I see now the RandomForestClassifier object has an attribute estimators_ which returns all the DecisionTreeClassifier objects in a list. - 编辑 -我现在看到RandomForestClassifier对象具有一个estimators_属性,该属性返回列表中的所有DecisionTreeClassifier对象。 So to solve this I can iterate through that list, predicting the results from each tree and taking a 'cumulative average'. 因此,要解决此问题,我可以遍历该列表,预测每棵树的结果并采用“累积平均值”。 However, is there really no easier way to do this already implemented? 但是,真的没有更简单的方法可以执行此操作了吗?

There is a discussion and code in this issue: https://github.com/scikit-learn/scikit-learn/issues/4273 此问题中有讨论和代码: https : //github.com/scikit-learn/scikit-learn/issues/4273

You can add trees one-by-one like this: 您可以像这样一一添加树:

n_estimators = 100
forest = RandomForestClassifier(warm_start=True, oob_score=True)

for i in range(1, n_estimators + 1):
    forest.set_params(n_estimators=i)
    forest.fit(X, y)
    print i, forest.oob_score_

The solution you propose also needs to get the oob indices for each tree, because you don't want to compute the score on all the training data. 您建议的解决方案还需要获取每棵树的oob索引,因为您不想计算所有训练数据的分数。

I still feel this is a strange thing to do as the is really no natural ordering of the trees in the forest. 我仍然觉得这是一件奇怪的事情,因为森林中的树木实际上并不是自然排列的。 Can you explain what you use-case is? 您能解释一下用例是什么吗? Do you want to find the minimum number of trees for a given accuracy to reduce prediction time? 您是否要找到给定精度的最小树数以减少预测时间? If you want fast prediction time, I'd suggest using GradientBoostingClassifier, which is usually much faster. 如果您想要快速的预测时间,我建议您使用GradientBoostingClassifier,它通常要快得多。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Scikit-Learn 在 Python 中为随机森林绘制树 - Plot trees for a Random Forest in Python with Scikit-Learn scikit learn Random Forest Classifier 概率阈值 - scikit learn Random Forest Classifier probability threshold 有没有办法使用 scikit-learn 来绘制随机森林的 OOB ROC 曲线? - Is there a way, using scikit-learn, to plot the OOB ROC curve for random forest? 如何为随机森林分类器,Ada Boost分类器,Extra Trees分类器访问python scikit学习代码 - how to access the python scikit learning code for Random Forest Classifier, Ada Boost Classifier, Extra Trees Classifier 随机森林中的引导程序数(scikit-learn) - The number of bootstraps in Random Forest (scikit-learn) 如何为随机森林分类器构建可重复使用的 scikit-learn 管道? - How to build re-usable scikit-learn pipeline for Random Forest Classifier? 如何在 Scikit-Learn 的随机森林分类器中设置子样本大小? 特别是对于不平衡的数据 - How can I set sub-sample size in Random Forest Classifier in Scikit-Learn? Especially for imbalanced data 如何在 GridSearchCV(随机森林分类器 Scikit)上获得最佳估计器 - How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit) Scikit学习随机森林拟合方法中的值误差 - Value Error in Scikit-learn Random forest fit method Python-scikit错误学习值格式的随机森林 - Python - Error with scikit learn Random Forest about values format
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM