How to interpret feature importance for ensemble methods?

Question

I'm using ensemble methods (random forest, xgbclassifier, etc) for classification.

One important aspect is feature importance prediction, which is like below:

           Importance
Feature-A   0.25
Feature-B   0.09
Feature-C   0.08
.......

This model achieves accuracy score around 0.85; obviously Feature-A is dominantly important, so I decided to remove Feature-A and calculated again.

However, after removing Feature-A, I still found a good performance with accuracy around 0.79.

This doesn't make sense to me, because Feature-A contributes 25% for the model, if removed, why accuracy score is barely affected?

I know ensemble methods hold an advantage to combine 'weak' features into 'strong' ones, so accuracy score mostly relies on aggregation and less sensitive to important feature removal?

Thanks

Answer 1

It's possible there are other features that are redundant with Feature A. For instance, suppose that features G,H,I are redundant with feature A: if you know the value of features G,H,I, then the value of feature A is pretty much determined.

That would be consistent with your results. If we include feature A, the model will learn to us it, as it's very simple to get excellent accuracy using just feature A and ignoring features G,H,I, so it'll have excellent accuracy, high importance for feature A, and low importance for features G,H,I. If we exclude feature A, the model can still get almost-as-good accuracy by using features G,H,I, so it'll still have very good accuracy (though the model might become more complicated because the relationship between G,H,I and class is more complicated than the relationship between A and class).

How to interpret feature importance for ensemble methods?

Question

1 answers

solution1
0 2017-05-24 16:01:40

How to interpret feature importance for ensemble methods?

Question

1 answers

solution1 0 2017-05-24 16:01:40

solution1
0 2017-05-24 16:01:40