Getting random forest prediction accuracy for a continuous variable in R

Question

I'm trying to predict a continuous variable (count) in R with random forest. The values of the predicted variable are min=1 and max=1000 .

I tried getting the prediction accuracy with "confusionMatrix", but naturally I get the error of different number of levels between the prediction and the predicted.

What is the best method of getting prediction accuracy in these circumstances?

Answer 1

@ mishakob

Roughly speaking, the root mean squared error can be understood as normalized deviance between actual and fitted values. it can be obtained as following.

library(randomForest)
set.seed(1237)
iris.rg <- randomForest(Sepal.Length ~ ., data=iris, importance=TRUE,
                        proximity=TRUE)

sqrt(sum((iris.rg$predicted - iris$Sepal.Length)^2) / nrow(iris))
[1] 0.3706187

Answer 2

randomForest should only show confusion matrices for categorical outcomes, so try ensuring your outcome is numeric. This will then show mean squared residuals instead. eg:

library(randomForest)
# This is probably what you're seeing
randomForest(as.factor(Sepal.Length) ~ Sepal.Width, data=iris)
# This is what you want to see
randomForest(Sepal.Length ~ Sepal.Width, data=iris)

Getting random forest prediction accuracy for a continuous variable in R

Question

2 answers

solution1
4 ACCPTED 2015-05-02 03:59:15

solution2
1 2015-05-01 23:17:07

Getting random forest prediction accuracy for a continuous variable in R

Question

2 answers

solution1 4 ACCPTED 2015-05-02 03:59:15

solution2 1 2015-05-01 23:17:07

solution1
4 ACCPTED 2015-05-02 03:59:15

solution2
1 2015-05-01 23:17:07