简体   繁体   English

从数据帧/列表/矩阵中的循环存储系数/变量重要性

[英]Storing coefficients/variable importance from a loop in a dataframe/list/matrix

I will be running a bunch of different models on a dataset in a loop and I'm seeking for a way to place the variable importance/coeffecients into a data frame for reference afterwards. 我将在一个数据集上循环运行一堆不同的模型,我正在寻找一种将变量的重要性/系数放入数据框中以供以后参考的方法。

I envision a dataframe/matrix with the model name as column headers and the list of all potential variables as the row names (or vice versa). 我设想了一个数据框/矩阵,其模型名称为列标题,所有潜在变量的列表为行名称(反之亦然)。

 library(MASS)
 library(caret)

 #which to use?
 coef_df = data.frame()
 coef_list = list()

 for (i in 0:1){
 subset = Boston[which(Boston$chas==i),]
 ctrl =trainControl(method='cv',number=5)
 rf_model = train(medv ~. , data=subset, trControl=ctrl, method='rf')
 gbm_model = train(medv ~. , data=subset, trControl=ctrl, method='gbm')
 #where does this go   =varImp(rf_model)
 #where does this go   =varImp(gbm_model)
  }

I think that is more or less 90% of any coding/typing I would need to do, I just don't know who to place the variable importance values into the correct bucket in a data frame/matrix, since each time in the varImp call the variables will be in a different order - even if they might be the same here. 我认为这大约是我需要进行的任何编码/键入的90%,我只是不知道是谁将变量重要性值放入数据帧/矩阵的正确存储区中,因为每次都在varImp中调用变量将采用不同的顺序-即使此处的变量可能相同。

Thanks! 谢谢!

Central rule in R : forget the for - it is forbidden. 在中央规则R忘记for -这是被禁止的。

Now, how you can do this elegantly with data.table giving only the results for the gbm method: 现在,如何使用data.table优雅地做到这一点, data.table给出gbm方法的结果:

gbm.DT <- Boston.DT[
  , {
    gbm_model <- train(medv ~. , data=.SD, trControl=ctrl, method='gbm')
    varImp(gbm_model)$importance
  }
  , keyby = (chas1 = chas)
  ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM