简体   繁体   中英

Zero Importance Feature Removal Without Refit in SciKit-Learn GradientBoostingClassifier

After fitting a GradientBoostingClassifier in SciKit-Learn, some of the features have zero importance.

My understanding is that zero importance would mean that no splits are made on this feature.

If I try to predict using a data set that does not include the feature then it throws an error for not having all the features.

Of course I realize I can remove the zero importance features, but I would rather not alter the already fit model. (If I remove the zero importance features and refit I get a slightly different model.)

Is this a bug that the model requires zero importance features to make predictions or is there something about the zero importance features I'm not thinking about? Is there a work around to get the exact same model?

(I'm forseeing a question about why this matters -- it's because requiring zero importance features means pulling more columns from a very very large database and it looks sloppy to include a feature in the model that does nothing.)

This is not a bug and is the expected behavior. Scikit will not make assumptions after the model has been trained about what features should have been included or not.

Instead, when you call fit for a model there is an implicit assumption being made that you have already performed feature selection to remove features that will not be important to the model. Once fit the expectation is that you will provide a dataset of the same size that was used to fit the model regardless of whether the features are important or not.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM