从组合的randomForest回归对象计算R平方（解释的％Var）

Question

When calculating a randomForest regression, the object includes the R-squared as " % Var explained: ... ". 在计算randomForest回归时，该对象包括R平方为“ % Var explained: ... ”。

library(randomForest)
library(doSNOW)
library(foreach)
library(ggplot2)

dat <- data.frame(ggplot2::diamonds[1:1000,1:7])
rf <- randomForest(formula = carat ~ ., data = dat, ntree = 500)
rf
# Call:
#   randomForest(formula = carat ~ ., data = dat, ntree = 500) 
#                Type of random forest: regression
#                      Number of trees: 500
# No. of variables tried at each split: 2
# 
# Mean of squared residuals: 0.001820046
# % Var explained: 95.22

However, when using a foreach loop to calculate and combine multiple randomForest objects, the R-squared values are not available, as it is noted in ?combine : 但是，当使用foreach循环计算和combine多个randomForest对象时，R平方值不可用，如在?combine ：

The confusion , err.rate , mse and rsq components (as well as the corresponding components in the test compnent, if exist) of the combined object will be NULL 组合对象的confusion ， err.rate ， mse和rsq组件（以及测试组件中的相应组件，如果存在）将为NULL

cl <- makeCluster(8)
registerDoSNOW(cl)

rfPar <- foreach(ntree=rep(63,8), 
                 .combine = combine, 
                 .multicombine = T, 
                 .packages = "randomForest") %dopar% 
                 {
                   randomForest(formula = carat ~ ., data = dat, ntree = ntree)
                 }
stopCluster(cl)

rfPar
# Call:
#   randomForest(formula = carat ~ ., data = dat, ntree = ntree) 
#                Type of random forest: regression
#                      Number of trees: 504
# No. of variables tried at each split: 2

Since it was not really answered in this question : Is it at all possible to calculate the R-squared (% Var explained) and Mean of squared residuals from an randomForest object afterwards? 由于在这个问题中没有真正回答：之后是否有可能计算出来自randomForest对象的R平方（％Var解释）和残差平均值？

(Critics of this parallelization might argue to use caret::train(... method = "parRF") , or others. However, this turns out to take forever. In fact, this might be useful for anybody who uses combine to merge randomForest objects...) （这种并行化的批评者可能会争论使用caret::train(... method = "parRF")或其他。但是，事实证明这需要永远。事实上，这可能对任何使用combine合并的人caret::train(... method = "parRF")用randomForest对象...）

Answer 1

Yes. 是。 You can calculate the R-squared value after the fact by taking the predictions that result from your training data and your trained model and comparing them to the actual values: 您可以通过获取训练数据和训练模型产生的预测并将它们与实际值进行比较来计算事后的R平方值：

# taking the object from the question:
actual <- dat$carat
predicted <- unname(predict(rfPar, dat))

R2 <- 1 - (sum((actual-predicted)^2)/sum((actual-mean(actual))^2))

Or Mean Squared Error: 或均方误差：

caret::RMSE(predicted,actual)

从组合的randomForest回归对象计算R平方（解释的％Var）

问题描述

1 个解决方案

解决方案1
7 已采纳 2017-05-23 15:11:59

从组合的randomForest回归对象计算R平方（解释的％Var）

问题描述

1 个解决方案

解决方案1 7 已采纳 2017-05-23 15:11:59

解决方案1
7 已采纳 2017-05-23 15:11:59