简体   繁体   中英

Best way to evaluate a random forest model accuracy on continuous data?

I have a random forest model that predicts a variable. This variable is not a categorical class but rather a number from 0 to 1. What is the best way to evaluate the accuracy of the generated models in this case?

Should I split the training and test parts and then simply calculate linear correlations between predicted and observed values in the test class?

Is there a more elegant solution? If so which package implements this?

You can of course split off some data as test (vs. train), but with a random forest this is generally not necessary since there is a "built-in" out-of-bag (OOB) error. Here is an example which ends with showing OOB error vs. # of trees on the "mtcars" dataset:

install.packages("randomForest")
library(randomForest)

head(mtcars)
set.seed(1)
fit <- randomForest(mpg ~ ., data = mtcars, importance = TRUE, proximity = TRUE)
print(fit)

# Look at variable importance:
importance(fit)

# OOB error vs. # of trees
plot(fit)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM