简体   繁体   English

GradientBoostingClassifier 的特征重要性是如何计算的

[英]How is feature importance calculated for GradientBoostingClassifier

I'm using scikit-learn's gradient-boosted trees classifier, GradientBoostingClassifier .我正在使用 scikit-learn 的梯度提升树分类器GradientBoostingClassifier It makes feature importance score available in feature_importances_ .它使feature_importances_特征重要性评分可用。 How are these feature importances calculated?这些特征重要性是如何计算的?

I'd like to understand what algorithm scikit-learn is using, to help me understand how to interpret those numbers.我想了解 scikit-learn 正在使用什么算法,以帮助我了解如何解释这些数字。 The algorithm isn't listed in the documentation.该算法未在文档中列出。

This is documented elsewhere in the scikit-learn documentation.这在 scikit-learn 文档的其他地方有记录。 In particular, here is how it works:特别是,它是如何工作的:

For each tree, we calculate the feature importance of a feature F as the fraction of samples that will traverse a node that splits based on feature F (see here ).对于每棵树,我们将特征 F 的特征重要性计算为将遍历基于特征 F 分裂的节点的样本的分数(参见此处)。 Then, we average those numbers across all trees (as described here ).然后,我们平均在所有的树木(如描述的那些数字在这里)。

It is not described exactly how scikit-learn estimates the fraction of nodes that will traverse a tree node that splits on feature F.没有准确描述 scikit-learn 如何估计将遍历在特征 F 上分裂的树节点的节点的分数。

The interpretation: scores will be in the range [0,1].解释:分数将在 [0,1] 范围内。 Higher scores mean the feature is more important.分数越高意味着该功能越重要。 This is an array with shape (n_features,) whose values are positive and sum to 1.0这是一个形状为 (n_features,) 的数组,其值为正且总和为 1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM