简体   繁体   English

获取 R 中随机森林的准确性

[英]Get the accuracy of a random forest in R

I have created a random forest out of my data:我从我的数据中创建了一个random forest

fit=randomForest(churn~., data=data_churn[3:17], ntree=1,
                 importance=TRUE, proximity=TRUE)

I can easily see my confusion matrix :我可以很容易地看到我的confusion matrix

conf <- fit$confusion
> conf
     No Yes class.error
No  945  80  0.07804878
Yes  84 101  0.45405405

Now I need to know the accuracy for the random forest.现在我需要知道随机森林的准确性。 I searched around and realized that caret library has a confusionMatrix method that gets a confusion matrix and returns the accuracy (alongside with many other values).我四处搜索并意识到插入符号库有一个confusionMatrix矩阵方法,该方法获取混淆矩阵并返回准确性(以及许多其他值)。 However, the method needs another parameter called "reference" .但是,该方法需要另一个名为"reference"参数。 My question is how can I provide a reference for the method to get the accuracy of my random forest?我的问题是如何为获得随机森林准确性的方法提供参考? And... is it the correct way to get the accuracy of a random forest?而且......这是获得随机森林准确性的正确方法吗?

Use randomForest(..., do.trace=T) to see the OOB error during training, by both class and ntree.使用randomForest(..., do.trace=T)查看训练过程中的 OOB 错误,按类和 ntree。

(FYI you chose ntree=1 so you'll only get just one rpart decision-tree, not a forest, this kind of defeats the purpose of using RF, and of randomly choosing a subset of both features and samples. You probably want to vary ntree values.) (仅供参考,您选择了ntree=1因此您只会得到一个 rpart 决策树,而不是森林,这违背了使用 RF 以及随机选择特征和样本的子集的目的。您可能想要改变ntree值。)

And after training, you can get per-class error from the rightmost column of the confusion matrix as you already found:训练后,您可以从混淆矩阵的最右侧列中获得每个类别的错误,正如您已经发现的那样:

> fit$confusion[, 'class.error']
class.error
No         Yes
0.07804878 0.45405405

(Also you probably want to set options('digits'=3) to not see those excessive decimal places) (你可能还想设置options('digits'=3)看不到那些过多的小数位)

As to converting that list of class errors ( accuracies = 1 - errors ) to one overall accuracy number, that's easy to do.至于将类错误列表(accuracy = 1 - errors )转换为一个总体准确率数字,这很容易做到。 You could use mean, class-weighted mean, harmonic mean (of accuracies, not of errors) etc. It depends on your application and the relative penalty for misclassifying.您可以使用均值、类加权均值、调和均值(准确度,而不是错误)等。这取决于您的应用程序和错误分类的相对惩罚。 Your example is simple, it's only two-class.你的例子很简单,它只有两类。

(or eg there are more complicated measures of inter-rater agreement) (或者例如有更复杂的评估者间协议措施)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM