I'm trying to find out which features have the most importance for my predictive model.
Currently I'm using sklearn's inbuilt attribute as such
Model = Model.fit(Train_Features, Labels_Train)
print(Model.feature_importances_)
It's just that its more of a black box type method, I'm not understanding what method it uses to weight the importance towards the features. Is there a better approach for doing this?
Feature importance is not a black-box when it comes to decision trees. From the documentation for a DecisionTreeRegressor :
The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.
For a forest, it just averages across the different trees in your forest. Check out the source code :
def feature_importances_(self):
"""Return the feature importances (the higher, the more important the
feature).
Returns
-------
feature_importances_ : array, shape = [n_features]
"""
if self.estimators_ is None or len(self.estimators_) == 0:
raise NotFittedError("Estimator not fitted, "
"call `fit` before `feature_importances_`.")
all_importances = Parallel(n_jobs=self.n_jobs,
backend="threading")(
delayed(getattr)(tree, 'feature_importances_')
for tree in self.estimators_)
return sum(all_importances) / len(self.estimators_)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.