简体繁体中英

Zero Importance Feature Removal Without Refit in SciKit-Learn GradientBoostingClassifier

原文 2018-07-31 17:43:32 4 1 python/ machine-learning/ scikit-learn

After fitting a GradientBoostingClassifier in SciKit-Learn, some of the features have zero importance.

My understanding is that zero importance would mean that no splits are made on this feature.

If I try to predict using a data set that does not include the feature then it throws an error for not having all the features.

Of course I realize I can remove the zero importance features, but I would rather not alter the already fit model. (If I remove the zero importance features and refit I get a slightly different model.)

Is this a bug that the model requires zero importance features to make predictions or is there something about the zero importance features I'm not thinking about? Is there a work around to get the exact same model?

(I'm forseeing a question about why this matters -- it's because requiring zero importance features means pulling more columns from a very very large database and it looks sloppy to include a feature in the model that does nothing.)

1 answers

This is not a bug and is the expected behavior. Scikit will not make assumptions after the model has been trained about what features should have been included or not.

Instead, when you call fit for a model there is an implicit assumption being made that you have already performed feature selection to remove features that will not be important to the model. Once fit the expectation is that you will provide a dataset of the same size that was used to fit the model regardless of whether the features are important or not.

scikit-learn logistic regression feature importance

Bug in Scikit-Learn GradientBoostingClassifier?

GradientBoostingClassifier with a BaseEstimator in scikit-learn?

implementation of R random forest feature importance score in scikit-learn

Feature Importance extraction of Decision Trees (scikit-learn)

Scikit-learn SelectFromModel - actually obtain the feature importance scores of underlying predictor

scikit-learn how to see feature importance using pipeline and how to do a logistic + ridge regression

Getting feature importance by sample - Python Scikit Learn

scikit learn - feature importance calculation in decision trees

How is feature importance calculated for GradientBoostingClassifier

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question scikit-learn logistic regression feature importance Bug in Scikit-Learn GradientBoostingClassifier? GradientBoostingClassifier with a BaseEstimator in scikit-learn? implementation of R random forest feature importance score in scikit-learn Feature Importance extraction of Decision Trees (scikit-learn) Scikit-learn SelectFromModel - actually obtain the feature importance scores of underlying predictor scikit-learn how to see feature importance using pipeline and how to do a logistic + ridge regression Getting feature importance by sample - Python Scikit Learn scikit learn - feature importance calculation in decision trees How is feature importance calculated for GradientBoostingClassifier

Related Tags

Zero Importance Feature Removal Without Refit in SciKit-Learn GradientBoostingClassifier

Question

1 answers

solution1 0 ACCPTED 2018-07-31 19:02:45

solution1
0 ACCPTED 2018-07-31 19:02:45