I have followed this Scikit-Learn example in Python to obtain .feature_importances_
from a forest estimator. In that example, ExtraTreesClassifier()
was used with its default hyperparameter settings - this would mean max_features='auto'
. The output of this example is a plot of importances for 10 features.
Question 1:
When I re-run this example, with max_features=2
, the plot is still showing feature importances for all 10 features. Should is only show the importances for 2 features?
Question 2:
Now, I would like to use ExtraTreesClassifier(max_features=2)
with RFECV()
. From the RFECV() docs , it indicates RFECV()
assigns the best features a rank of 1 - we can see this in the .ranking_
attribute of RFECV()
. However, if I specify the estimator to be ExtraTreesClassifier(max_features=2)
, then does RFECV()
use 2 features in its estimator and only return ranks for 2 features? Or does it ignore max_features
and return ranks for all the features?
max_features
specifies how many features the learning algorithm looks at when deciding which feature provides the best split in a node of the tree. The features are randomly chosen for each node. So the decision tree still uses all features. It is just a way to speed up the learning.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.