如何使用游侠通过 class 获得特征重要性？

Question

I have been using ranger and randomForest functions in R.我一直在 R 中使用 ranger 和 randomForest 函数。 I am particularly interested in getting the importance of features (predictors) for each class that I am trying to predict, rather than the overall importance for all classes together.我特别感兴趣的是获得我试图预测的每个 class 的特征（预测变量）的重要性，而不是所有类的整体重要性。 I know how to do this using the importance() function from randomForest in which it seems to be the default behaviour:我知道如何使用 randomForest 中的重要性（） function 来做到这一点，这似乎是默认行为：

library(randomForest)
set.seed(100)
rfmodel <- randomForest(Species ~ ., data = iris, ntree = 1000, importance = TRUE)
importance(rfmodel)

This results in a matrix with the importance of each feature for each of the three classes这会产生一个矩阵，其中包含三个类别中每个类别的每个特征的重要性

Alternatively for ranger I am running:或者对于我正在运行的游侠：

library(ranger)
rangermodel<-ranger(Species ~ ., data = iris, num.trees = 1000, write.forest=TRUE, importance="permutation", local.importance=TRUE)
importance(rangermodel)
rangermodel$variable.importance
rangermodel$variable.importance.local

rangermodel$variable.importance provides the importance of the features for the whole classification problem, but not by class. rangermodel$variable.importance 提供了特征对整个分类问题的重要性，但不是 class。 While rangermodel$variable.importance.local provides the importance for each case, but also not by class.虽然 rangermodel$variable.importance.local 提供了每种情况的重要性，但也不是 class。

The ranger documentation does not seem to provide information on this.游侠文档似乎没有提供这方面的信息。 The only question I could find on the topic is this one: How can I separate the overall variable importance values when using Random forest?我能找到的关于这个主题的唯一问题是：如何在使用随机森林时分离整体变量重要性值？ But they did not reach a conclusion on how to achieve this with ranger.但是他们没有就如何使用 Ranger 实现这一点得出结论。 Changing the ranger code as below did not provide the output I am looking for either:如下更改游侠代码并没有提供我正在寻找的 output ：

rangermodel<-ranger(Species ~ ., data = iris, num.trees = 1000, write.forest=TRUE, importance="impurity")

Answer 1

The idea is to use local variable importance, defined as below:这个想法是使用局部变量重要性，定义如下：

For each case, consider all the trees for which it is oob.对于每种情况，考虑它是 oob 的所有树。 Subtract the percentage of votes for the correct class in the variable-m-permuted oob data from the percentage of votes for the correct class in the untouched oob data.从未触及的 oob 数据中正确 class 的投票百分比中减去变量 m 置换 oob 数据中正确 class 的投票百分比。 This is the local importance score for variable m for this case.这是这种情况下变量 m 的局部重要性分数。 Source: Breiman's and Cutler website, section: Variable Importance 资料来源：Breiman's and Cutler 网站，部分：变量重要性

Extracting local variable importance from ranger : you need to specify both importance = "permutation" and local.importance = TRUE从ranger中提取局部变量重要性：您需要同时指定importance = "permutation"和local.importance = TRUE

library(ranger)
rf.iris <- ranger(Species ~ ., iris, importance = "permutation", 
             local.importance = TRUE)
rf.iris$variable.importance.local

Then you can那么你也能

library(data.table)    
as.data.table(rf.iris$variable.importance.local)[,Species := iris$Species][,lapply(.SD,mean),by=Species]

Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1:     setosa      0.01316     0.00252      0.11192     0.12548
2: versicolor      0.00800     0.00120      0.10672     0.11112
3:  virginica      0.01352     0.00316      0.10632     0.09956

Refs:参考：

如何使用游侠通过 class 获得特征重要性？

问题描述

1 个解决方案

解决方案1
1 2021-03-17 13:17:34

如何使用游侠通过 class 获得特征重要性？

问题描述

1 个解决方案

解决方案1 1 2021-03-17 13:17:34

解决方案1
1 2021-03-17 13:17:34