简体   繁体   中英

Getting a specific random forest variable importance measure from mlr package's resample function

I am using mlr package's resample() function to subsample a random forest model 4000 times (the code snippet below).

As you can see, to create random forest models within resample() I'm usingrandomForest package.

I want to get random forest model's importance results (mean decrease in accuracy over all classes) for each of the subsample iterations. What I can get right now as the importance measure is the mean decrease in Gini index.

What I can see from the source code of mlr, getFeatureImportanceLearner.classif.randomForest() function (line 69) in makeRLearner.classif.randomForest uses randomForest::importance() function (line 83) to get importance value from the resulting object of randomForest class . But as you can see from the source code (line 73) it uses 2L as the default value. I want it to use 1L (line 75) as the value (mean decrease in accuracy).

How can I pass the value of 2L to resample() function, ("extract = getFeatureImportance" line in the code below) so that getFeatureImportanceLearner.classif.randomForest() function gets that value and sets ctrl$type = 2L (line 73)?

rf_task <- makeClassifTask(id = 'task',
                           data = data[, -1], target = 'target_var',
                           positive = 'positive_var')

rf_learner <- makeLearner('classif.randomForest', id = 'random forest',
                          par.vals = list(ntree = 1000, importance = TRUE),
                          predict.type = 'prob')

base_subsample_instance <- makeResampleInstance(rf_boot_desc, rf_task)

rf_subsample_result <- resample(rf_learner, rf_task,
                                base_subsample_instance,
                                extract = getFeatureImportance,
                                measures = list(acc, auc, tpr, tnr,
                                                ppv, npv, f1, brier))

My solution: Downloaded source code of the mlr package. Changed the source file line 73 to 1L ( https://github.com/mlr-org/mlr/blob/v2.15.0/R/RLearner_classif_randomForest.R ). Installed the package from command line and used it. Not an optimal solution but a solution.

You provide a lot of specifics that do not actually relate to your question, at least how I understood it. So I wrote a simple MWE that includes the answer. The idea is that you have to write a short wrapper for getFeatureImportance so that you can pass your own arguments. Fans of purrr can do that with purrr::partial(getFeatureImportance, type = 2) but here I wrote myExtractor manually.

library(mlr)
rf_learner <- makeLearner('classif.randomForest', id = 'random forest',
                          par.vals = list(ntree = 100, importance = TRUE),
                          predict.type = 'prob')

measures = list(acc, auc, tpr, tnr,
                ppv, npv, f1, brier)

myExtractor = function(.model, ...) {
  getFeatureImportance(.model, type = 2, ...)
}

res = resample(rf_learner, sonar.task, cv10, 
               measures = measures, extract = myExtractor)

# first feature importance result:
res$extract[[1]]

# all values in a matrix:
sapply(res$extract, function(x) x$res)

If you want to do a bootstraped learenr maybe you should also have a look at makeBaggingWrapper instead of solving this problem through resample .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM