[英]Get the accuracy of a random forest in R
I have created a random forest
out of my data:我从我的数据中创建了一个random forest
:
fit=randomForest(churn~., data=data_churn[3:17], ntree=1,
importance=TRUE, proximity=TRUE)
I can easily see my confusion matrix
:我可以很容易地看到我的confusion matrix
:
conf <- fit$confusion
> conf
No Yes class.error
No 945 80 0.07804878
Yes 84 101 0.45405405
Now I need to know the accuracy for the random forest.现在我需要知道随机森林的准确性。 I searched around and realized that caret library has a confusionMatrix
method that gets a confusion matrix and returns the accuracy (alongside with many other values).我四处搜索并意识到插入符号库有一个confusionMatrix
矩阵方法,该方法获取混淆矩阵并返回准确性(以及许多其他值)。 However, the method needs another parameter called "reference"
.但是,该方法需要另一个名为"reference"
参数。 My question is how can I provide a reference for the method to get the accuracy of my random forest?我的问题是如何为获得随机森林准确性的方法提供参考? And... is it the correct way to get the accuracy of a random forest?而且......这是获得随机森林准确性的正确方法吗?
Use randomForest(..., do.trace=T)
to see the OOB error during training, by both class and ntree.使用randomForest(..., do.trace=T)
查看训练过程中的 OOB 错误,按类和 ntree。
(FYI you chose ntree=1
so you'll only get just one rpart decision-tree, not a forest, this kind of defeats the purpose of using RF, and of randomly choosing a subset of both features and samples. You probably want to vary ntree
values.) (仅供参考,您选择了ntree=1
因此您只会得到一个 rpart 决策树,而不是森林,这违背了使用 RF 以及随机选择特征和样本的子集的目的。您可能想要改变ntree
值。)
And after training, you can get per-class error from the rightmost column of the confusion matrix as you already found:训练后,您可以从混淆矩阵的最右侧列中获得每个类别的错误,正如您已经发现的那样:
> fit$confusion[, 'class.error']
class.error
No Yes
0.07804878 0.45405405
(Also you probably want to set options('digits'=3)
to not see those excessive decimal places) (你可能还想设置options('digits'=3)
看不到那些过多的小数位)
As to converting that list of class errors ( accuracies = 1 - errors ) to one overall accuracy number, that's easy to do.至于将类错误列表(accuracy = 1 - errors )转换为一个总体准确率数字,这很容易做到。 You could use mean, class-weighted mean, harmonic mean (of accuracies, not of errors) etc. It depends on your application and the relative penalty for misclassifying.您可以使用均值、类加权均值、调和均值(准确度,而不是错误)等。这取决于您的应用程序和错误分类的相对惩罚。 Your example is simple, it's only two-class.你的例子很简单,它只有两类。
(or eg there are more complicated measures of inter-rater agreement) (或者例如有更复杂的评估者间协议措施)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.