[英]How to interpret feature importance for ensemble methods?
I'm using ensemble methods (random forest, xgbclassifier, etc) for classification. 我正在使用集成方法(随机森林,xgbclassifier等)进行分类。
One important aspect is feature importance prediction, which is like below: 一个重要方面是特征重要性预测,如下所示:
Importance
Feature-A 0.25
Feature-B 0.09
Feature-C 0.08
.......
This model achieves accuracy score around 0.85; 该模型的准确度得分约为0.85; obviously Feature-A is dominantly important, so I decided to remove Feature-A and calculated again.
显然Feature-A很重要,因此我决定删除Feature-A并重新计算。
However, after removing Feature-A, I still found a good performance with accuracy around 0.79. 但是,删除Feature-A之后,我仍然发现性能不错,精度约为0.79。
This doesn't make sense to me, because Feature-A contributes 25% for the model, if removed, why accuracy score is barely affected? 这对我来说没有意义,因为Feature-A对模型的贡献为25%(如果删除),为什么精确度评分几乎没有受到影响?
I know ensemble methods hold an advantage to combine 'weak' features into 'strong' ones, so accuracy score mostly relies on aggregation and less sensitive to important feature removal? 我知道集成方法在将“弱”特征组合为“强”特征方面具有优势,因此准确性得分主要取决于聚合,而对重要特征去除的敏感性较低?
Thanks 谢谢
It's possible there are other features that are redundant with Feature A. For instance, suppose that features G,H,I are redundant with feature A: if you know the value of features G,H,I, then the value of feature A is pretty much determined. 可能还有其他功能与功能A冗余。例如,假设功能G,H,I与功能A冗余:如果您知道功能G,H,I的值,则功能A的值为几乎下定决心。
That would be consistent with your results. 那将与您的结果一致。 If we include feature A, the model will learn to us it, as it's very simple to get excellent accuracy using just feature A and ignoring features G,H,I, so it'll have excellent accuracy, high importance for feature A, and low importance for features G,H,I.
如果我们包含特征A,则模型将向我们学习,因为仅使用特征A并忽略特征G,H,I即可获得极佳的准确性非常简单,因此它将具有出色的准确性,对特征A的高度重视以及特征G,H,I的重要性较低。 If we exclude feature A, the model can still get almost-as-good accuracy by using features G,H,I, so it'll still have very good accuracy (though the model might become more complicated because the relationship between G,H,I and class is more complicated than the relationship between A and class).
如果我们排除特征A,则通过使用特征G,H,I,模型仍然可以获得几乎相同的精度,因此它仍将具有非常好的准确性(尽管由于G,H之间的关系,模型可能会变得更加复杂,我和班级要比A和班级之间的关系复杂。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.