Best way to evaluate a random forest model accuracy on continuous data?

Question

I have a random forest model that predicts a variable. This variable is not a categorical class but rather a number from 0 to 1. What is the best way to evaluate the accuracy of the generated models in this case?

Should I split the training and test parts and then simply calculate linear correlations between predicted and observed values in the test class?

Is there a more elegant solution? If so which package implements this?

Answer 1

You can of course split off some data as test (vs. train), but with a random forest this is generally not necessary since there is a "built-in" out-of-bag (OOB) error. Here is an example which ends with showing OOB error vs. # of trees on the "mtcars" dataset:

install.packages("randomForest")
library(randomForest)

head(mtcars)
set.seed(1)
fit <- randomForest(mpg ~ ., data = mtcars, importance = TRUE, proximity = TRUE)
print(fit)

# Look at variable importance:
importance(fit)

# OOB error vs. # of trees
plot(fit)

Best way to evaluate a random forest model accuracy on continuous data?

Question

1 answers

solution1
0 2017-09-30 20:09:34

Best way to evaluate a random forest model accuracy on continuous data?

Question

1 answers

solution1 0 2017-09-30 20:09:34

solution1
0 2017-09-30 20:09:34