简体   繁体   中英

How Random Forest and XGB 'Regressor' calculate feature importance

I am searching about how Random Forest and XGB 'regressor' calculate feature importance. However, the most of discussion focus on Classifier.

I try to find out the answer in the official document but got some question.

  1. InXGB official site , the description of get_score indicate that 'For linear model, only “weight” is defined and it's the normalized coefficients without bias.' Dose it means the feature importance is calculated only by the coefficients between input and output? Not calculated by mse or gini?

  2. In RF official site , the description of feature_importances_ indicates that 'The impurity-based feature importances.' But in the RF source code line 1125, it noted that 'Supported criteria are "mse" for the mean squared error, which is equal to variance reduction as feature selection criterion' Dose RF regressor apply impurity-based or mse for feature importance calculation?

I consider gini impurity as a criteria for classification, so the illustration above confused me.

It will be helpful if sometime can give me a guide for how to understand these documents. For example, how to trace which function did I run.

Thank you!

clf = RandomForestClassifier() clf.fit(df.drop('name', axis=1), df['name'])

plt.figure(figsize=(10,10)) plt.bar(df.drop('name', axis=1).columns, height=clf.feature_importances_, bottom = 0, width=0.8) plt.xticks(rotation=80)

hight_rate_col = df.drop('name', axis=1).columns[clf.feature_importances_ > 0.1] x_train_rate, x_test_rate, y_train_rate, y_test_rate = train_test_split(df[hight_rate_col], df['name'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM