获取 R 中随机森林的准确性

Question

I have created a random forest out of my data:我从我的数据中创建了一个random forest ：

fit=randomForest(churn~., data=data_churn[3:17], ntree=1,
                 importance=TRUE, proximity=TRUE)

I can easily see my confusion matrix :我可以很容易地看到我的confusion matrix ：

conf <- fit$confusion
> conf
     No Yes class.error
No  945  80  0.07804878
Yes  84 101  0.45405405

Now I need to know the accuracy for the random forest.现在我需要知道随机森林的准确性。 I searched around and realized that caret library has a confusionMatrix method that gets a confusion matrix and returns the accuracy (alongside with many other values).我四处搜索并意识到插入符号库有一个confusionMatrix矩阵方法，该方法获取混淆矩阵并返回准确性（以及许多其他值）。 However, the method needs another parameter called "reference" .但是，该方法需要另一个名为"reference"参数。 My question is how can I provide a reference for the method to get the accuracy of my random forest?我的问题是如何为获得随机森林准确性的方法提供参考？ And... is it the correct way to get the accuracy of a random forest?而且......这是获得随机森林准确性的正确方法吗？

Answer 1

Use randomForest(..., do.trace=T) to see the OOB error during training, by both class and ntree.使用randomForest(..., do.trace=T)查看训练过程中的 OOB 错误，按类和 ntree。

(FYI you chose ntree=1 so you'll only get just one rpart decision-tree, not a forest, this kind of defeats the purpose of using RF, and of randomly choosing a subset of both features and samples. You probably want to vary ntree values.) （仅供参考，您选择了ntree=1因此您只会得到一个 rpart 决策树，而不是森林，这违背了使用 RF 以及随机选择特征和样本的子集的目的。您可能想要改变ntree值。）

And after training, you can get per-class error from the rightmost column of the confusion matrix as you already found:训练后，您可以从混淆矩阵的最右侧列中获得每个类别的错误，正如您已经发现的那样：

> fit$confusion[, 'class.error']
class.error
No         Yes
0.07804878 0.45405405

(Also you probably want to set options('digits'=3) to not see those excessive decimal places) （你可能还想设置options('digits'=3)看不到那些过多的小数位）

As to converting that list of class errors ( accuracies = 1 - errors ) to one overall accuracy number, that's easy to do.至于将类错误列表（accuracy = 1 - errors ）转换为一个总体准确率数字，这很容易做到。 You could use mean, class-weighted mean, harmonic mean (of accuracies, not of errors) etc. It depends on your application and the relative penalty for misclassifying.您可以使用均值、类加权均值、调和均值（准确度，而不是错误）等。这取决于您的应用程序和错误分类的相对惩罚。 Your example is simple, it's only two-class.你的例子很简单，它只有两类。

(or eg there are more complicated measures of inter-rater agreement) （或者例如有更复杂的评估者间协议措施）

获取 R 中随机森林的准确性

问题描述

1 个解决方案

解决方案1
6 已采纳 2015-06-08 22:09:44

获取 R 中随机森林的准确性

问题描述

1 个解决方案

解决方案1 6 已采纳 2015-06-08 22:09:44

解决方案1
6 已采纳 2015-06-08 22:09:44