简体   繁体   中英

Feature Importance for Random Forest Regressor in Python

I'm trying to find out which features have the most importance for my predictive model.

Currently I'm using sklearn's inbuilt attribute as such

Model = Model.fit(Train_Features, Labels_Train)
print(Model.feature_importances_)

It's just that its more of a black box type method, I'm not understanding what method it uses to weight the importance towards the features. Is there a better approach for doing this?

Feature importance is not a black-box when it comes to decision trees. From the documentation for a DecisionTreeRegressor :

The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

For a forest, it just averages across the different trees in your forest. Check out the source code :

def feature_importances_(self):
    """Return the feature importances (the higher, the more important the
       feature).
    Returns
    -------
    feature_importances_ : array, shape = [n_features]
    """
    if self.estimators_ is None or len(self.estimators_) == 0:
        raise NotFittedError("Estimator not fitted, "
                             "call `fit` before `feature_importances_`.")

    all_importances = Parallel(n_jobs=self.n_jobs,
                               backend="threading")(
        delayed(getattr)(tree, 'feature_importances_')
        for tree in self.estimators_)

    return sum(all_importances) / len(self.estimators_)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM