简体繁体 English

随机森林中的袋外误差图

[英]Out-Of-Bag error plot in Random Forest

原文 2018-02-05 11:53:53 5 1 r/ classification/ random-forest

I tried to fit Random Forest to my data set to do a classification between Control and Alzheimer group. 我尝试使Random Forest适合我的数据集，以对Control和Alzheimer组进行分类。 In the first try I got the left OOB error plot and in the second try that I decreased the number of variables in my data set, I got the right side OOB error plot. 在第一次尝试中，我获得了左侧的OOB错误图，而在第二次尝试中，我减少了数据集中的变量数，因此获得了右侧OOB错误图。 My problem is comparing these two plots, what is the better OOB plot?should the class error for Alzhemier and Control be close to OOB curve of the Forest? 我的问题是比较这两个图，什么是更好的OOB图？Alzhemier和Control的类误差应该接近Forest的OOB曲线吗？ if yes why? 如果是，为什么？

1 个解决方案

The plot on the right has a better OOB error. 右边的图具有更好的OOB错误。 I assume that the Alzheimer and control lines are also OOB errors but calculated for the particular classes. 我假设阿尔茨海默氏症和控制线也是OOB错误，但是针对特定类别计算的。 The random forest predictor is constructed by bootstrapping a fraction of the samples, the OOB error is calculated on the samples that were not selected (out of the bag) on each iteration of the algorithm. 随机森林预测变量是通过自举一部分样本而构造的，OOB误差是在算法的每次迭代中针对未选择的样本（袋外）计算的。 Therefore, OOB error is an estimation of the performance as you build the model as described by Breinman , and smaller errors are of course better. 因此，OOB误差是按照Breinman的描述在构建模型时对性能的估计，而较小的误差当然更好。

"should the class error for Alzheimer and Control be closer to OOB curve of the Forest?." “阿尔茨海默氏病和对照的分类误差是否应该更接近森林的OOB曲线？” It depends on how good your model is at predicting each class. 这取决于您的模型预测每个班级的能力。 If the classification error is similar for both classes, then the OOB error will be close to both. 如果两个类的分类错误相似，则OOB错误将接近两个类。