获得R中连续变量的随机森林预测精度

Question

I'm trying to predict a continuous variable (count) in R with random forest. 我试图用随机森林预测R的连续变量（计数）。 The values of the predicted variable are min=1 and max=1000 . 预测变量的值是min=1和max=1000 。

I tried getting the prediction accuracy with "confusionMatrix", but naturally I get the error of different number of levels between the prediction and the predicted. 我尝试用“confusionMatrix”获得预测准确度，但自然地我得到了预测和预测之间不同级别的错误。

What is the best method of getting prediction accuracy in these circumstances? 在这些情况下获得预测准确性的最佳方法是什么？

Answer 1

@ mishakob @ mishakob

Roughly speaking, the root mean squared error can be understood as normalized deviance between actual and fitted values. 粗略地说，均方根误差可以理解为实际值和拟合值之间的归一化偏差。 it can be obtained as following. 它可以如下获得。

library(randomForest)
set.seed(1237)
iris.rg <- randomForest(Sepal.Length ~ ., data=iris, importance=TRUE,
                        proximity=TRUE)

sqrt(sum((iris.rg$predicted - iris$Sepal.Length)^2) / nrow(iris))
[1] 0.3706187

Answer 2

randomForest should only show confusion matrices for categorical outcomes, so try ensuring your outcome is numeric. randomForest应该只显示分类结果的混淆矩阵，因此请尝试确保结果是数字的。 This will then show mean squared residuals instead. 然后，这将显示均方残差。 eg: 例如：

library(randomForest)
# This is probably what you're seeing
randomForest(as.factor(Sepal.Length) ~ Sepal.Width, data=iris)
# This is what you want to see
randomForest(Sepal.Length ~ Sepal.Width, data=iris)

获得R中连续变量的随机森林预测精度

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-05-02 03:59:15

解决方案2
1 2015-05-01 23:17:07

获得R中连续变量的随机森林预测精度

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-05-02 03:59:15

解决方案2 1 2015-05-01 23:17:07

解决方案1
4 已采纳 2015-05-02 03:59:15

解决方案2
1 2015-05-01 23:17:07