简体   繁体   中英

How is feature importance calculated for GradientBoostingClassifier

I'm using scikit-learn's gradient-boosted trees classifier, GradientBoostingClassifier . It makes feature importance score available in feature_importances_ . How are these feature importances calculated?

I'd like to understand what algorithm scikit-learn is using, to help me understand how to interpret those numbers. The algorithm isn't listed in the documentation.

This is documented elsewhere in the scikit-learn documentation. In particular, here is how it works:

For each tree, we calculate the feature importance of a feature F as the fraction of samples that will traverse a node that splits based on feature F (see here ). Then, we average those numbers across all trees (as described here ).

It is not described exactly how scikit-learn estimates the fraction of nodes that will traverse a tree node that splits on feature F.

The interpretation: scores will be in the range [0,1]. Higher scores mean the feature is more important. This is an array with shape (n_features,) whose values are positive and sum to 1.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM