[英]Calculate R-squared (%Var explained) from combined randomForest regression object
When calculating a randomForest
regression, the object includes the R-squared as " % Var explained: ...
". 在计算randomForest
回归时,该对象包括R平方为“ % Var explained: ...
”。
library(randomForest)
library(doSNOW)
library(foreach)
library(ggplot2)
dat <- data.frame(ggplot2::diamonds[1:1000,1:7])
rf <- randomForest(formula = carat ~ ., data = dat, ntree = 500)
rf
# Call:
# randomForest(formula = carat ~ ., data = dat, ntree = 500)
# Type of random forest: regression
# Number of trees: 500
# No. of variables tried at each split: 2
#
# Mean of squared residuals: 0.001820046
# % Var explained: 95.22
However, when using a foreach
loop to calculate and combine
multiple randomForest
objects, the R-squared values are not available, as it is noted in ?combine
: 但是,当使用foreach
循环计算和combine
多个randomForest
对象时,R平方值不可用,如在?combine
:
The
confusion
,err.rate
,mse
andrsq
components (as well as the corresponding components in the test compnent, if exist) of the combined object will beNULL
组合对象的confusion
,err.rate
,mse
和rsq
组件(以及测试组件中的相应组件,如果存在)将为NULL
cl <- makeCluster(8)
registerDoSNOW(cl)
rfPar <- foreach(ntree=rep(63,8),
.combine = combine,
.multicombine = T,
.packages = "randomForest") %dopar%
{
randomForest(formula = carat ~ ., data = dat, ntree = ntree)
}
stopCluster(cl)
rfPar
# Call:
# randomForest(formula = carat ~ ., data = dat, ntree = ntree)
# Type of random forest: regression
# Number of trees: 504
# No. of variables tried at each split: 2
Since it was not really answered in this question : Is it at all possible to calculate the R-squared (% Var explained) and Mean of squared residuals from an randomForest
object afterwards? 由于在这个问题中没有真正回答:之后是否有可能计算出来自randomForest
对象的R平方(%Var解释)和残差平均值?
(Critics of this parallelization might argue to use caret::train(... method = "parRF")
, or others. However, this turns out to take forever. In fact, this might be useful for anybody who uses combine
to merge randomForest
objects...) (这种并行化的批评者可能会争论使用caret::train(... method = "parRF")
或其他。但是,事实证明这需要永远。事实上,这可能对任何使用combine
合并的人caret::train(... method = "parRF")
用randomForest
对象...)
Yes. 是。 You can calculate the R-squared value after the fact by taking the predictions that result from your training data and your trained model and comparing them to the actual values: 您可以通过获取训练数据和训练模型产生的预测并将它们与实际值进行比较来计算事后的R平方值:
# taking the object from the question:
actual <- dat$carat
predicted <- unname(predict(rfPar, dat))
R2 <- 1 - (sum((actual-predicted)^2)/sum((actual-mean(actual))^2))
Or Mean Squared Error: 或均方误差:
caret::RMSE(predicted,actual)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.