[英]Variable importance using the caret package (error); RandomForest algorithm
I am trying to obtain the variable importance of a rf model in any way. 我试图以任何方式获得射频模型的变量重要性。 This is the approach I have tried so far, but alternate suggestions are very welcome. 这是我到目前为止尝试过的方法,但非常欢迎其他建议。
I have trained a model in R: 我在R训练了一个模型:
require(caret)
require(randomForest)
myControl = trainControl(method='cv',number=5,repeats=2,returnResamp='none')
model2 = train(increaseInAssessedLevel~., data=trainData, method = 'rf', trControl=myControl)
The dataset is fairly large, but the model runs fine. 数据集相当大,但模型运行正常。 I can access its parts and run commands such as: 我可以访问它的部件并运行命令,例如:
> model2[3]
$results
mtry RMSE Rsquared RMSESD RsquaredSD
1 2 0.1901304 0.3342449 0.004586902 0.05089500
2 61 0.1080164 0.6984240 0.006195397 0.04428158
3 120 0.1084201 0.6954841 0.007119253 0.04362755
But I get the following error: 但是我收到以下错误:
> varImp(model2)
Error in varImp[, "%IncMSE"] : subscript out of bounds
Apparently there is supposed to be a wrapper, but that does not seem to be the case: (cf: http://www.inside-r.org/packages/cran/caret/docs/varImp ) 显然应该有一个包装器,但似乎并非如此:(cf: http : //www.inside-r.org/packages/cran/caret/docs/varImp )
varImp.randomForest(model2)
Error: could not find function "varImp.randomForest"
But this is particularly odd: 但这特别奇怪:
> traceback()
No traceback available
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] elasticnet_1.1 lars_1.2 klaR_0.6-9 MASS_7.3-26
[5] kernlab_0.9-18 nnet_7.3-6 randomForest_4.6-7 doMC_1.3.0
[9] iterators_1.0.6 caret_5.17-7 reshape2_1.2.2 plyr_1.8
[13] lattice_0.20-15 foreach_1.4.1 cluster_1.14.4
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.0.1 grid_3.0.1 stringr_0.6.2
[5] tools_3.0.1
The importance scores can take a while to compute and train
won't automatically get randomForest
to create them. 重要性分数可能需要一段时间来计算和train
不会自动获得randomForest
来创建它们。 Add importance = TRUE
to the train
call and it should work. 添加importance = TRUE
到train
呼叫,它应该工作。
Max 马克斯
That is becouse the obtained from train()
object is not a pure Random Forest model, but a list of different objects (containing the final model itself as well as cross-validation results etc). 这就是从train()
获得的对象不是纯随机森林模型,而是不同对象的列表(包含最终模型本身以及交叉验证结果等)。 You may see them with ls(model2)
. 你可以用ls(model2)
看到它们。 So to use the final model just call varImp(model2$finalModel)
. 所以要使用最终模型只需调用varImp(model2$finalModel)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.