简体   繁体   中英

Scikit-Learn manually specifying .max_features in RFECV()-how many features get ranked?

I have followed this Scikit-Learn example in Python to obtain .feature_importances_ from a forest estimator. In that example, ExtraTreesClassifier() was used with its default hyperparameter settings - this would mean max_features='auto' . The output of this example is a plot of importances for 10 features.

Question 1:

When I re-run this example, with max_features=2 , the plot is still showing feature importances for all 10 features. Should is only show the importances for 2 features?

Question 2:

Now, I would like to use ExtraTreesClassifier(max_features=2) with RFECV() . From the RFECV() docs , it indicates RFECV() assigns the best features a rank of 1 - we can see this in the .ranking_ attribute of RFECV() . However, if I specify the estimator to be ExtraTreesClassifier(max_features=2) , then does RFECV() use 2 features in its estimator and only return ranks for 2 features? Or does it ignore max_features and return ranks for all the features?

max_features specifies how many features the learning algorithm looks at when deciding which feature provides the best split in a node of the tree. The features are randomly chosen for each node. So the decision tree still uses all features. It is just a way to speed up the learning.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM