Feature Importance for Random Forest Regressor in Python

Question

I'm trying to find out which features have the most importance for my predictive model.

Currently I'm using sklearn's inbuilt attribute as such

Model = Model.fit(Train_Features, Labels_Train)
print(Model.feature_importances_)

It's just that its more of a black box type method, I'm not understanding what method it uses to weight the importance towards the features. Is there a better approach for doing this?

Answer 1

Feature importance is not a black-box when it comes to decision trees. From the documentation for a DecisionTreeRegressor :

The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

For a forest, it just averages across the different trees in your forest. Check out the source code :

def feature_importances_(self):
    """Return the feature importances (the higher, the more important the
       feature).
    Returns
    -------
    feature_importances_ : array, shape = [n_features]
    """
    if self.estimators_ is None or len(self.estimators_) == 0:
        raise NotFittedError("Estimator not fitted, "
                             "call `fit` before `feature_importances_`.")

    all_importances = Parallel(n_jobs=self.n_jobs,
                               backend="threading")(
        delayed(getattr)(tree, 'feature_importances_')
        for tree in self.estimators_)

    return sum(all_importances) / len(self.estimators_)

Feature Importance for Random Forest Regressor in Python

Question

1 answers

solution1
2 2016-08-29 22:00:36

Feature Importance for Random Forest Regressor in Python

Question

1 answers

solution1 2 2016-08-29 22:00:36

solution1
2 2016-08-29 22:00:36