打印功能重要性百分比

Question

I fit the basic LGBM model in Python.我将基本的 LGBM model 安装在 Python 中。

# Create an instance
LGBM = LGBMRegressor(random_state = 123, importance_type = 'gain') # `split` can be also selected here

# Fit the model (subset of data)
LGBM.fit(X_train_subset, y_train_subset)

# Predict y_pred
y_pred = LGBM.predict(X_test)

I am looking at documentation:我正在查看文档：

importance_type (string, optional (default="split")) – How the importance is calculated. importance_type (string, optional (default="split")) – 重要性是如何计算的。 If “split”, result contains numbers of times the feature is used in a model.如果“拆分”，则结果包含该功能在 model 中使用的次数。 If “gain”, result contains total gains of splits which use the feature.如果“增益”，结果包含使用该特征的分割的总增益。

I used gain and it prints me the total gains.我使用了gain ，它打印了我的总增益。

# Print features by importantce
pd.DataFrame([X_train.columns, LGBM.feature_importances_]).T.sort_values([1], ascending = [True])

         0         1

59  SLG_avg_p      0
4   PA_avg         2995.8
0   home           5198.55
26  next_home      11824.2
67  first_time_pitcher  15042.1
etc

I tried:我试过了：

# get importance
importance = LGBM.feature_importances_
# summarize feature importance
for i, v in enumerate(importance):
    print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
plt.bar([x for x in range(len(importance))], importance)
plt.show()

And receive values and plot:并接收值和 plot：

Feature: 0, Score: 5198.55005
Feature: 1, Score: 20688.87198
Feature: 2, Score: 49147.90228
Feature: 3, Score: 71734.03088
etc

I also tried:我也试过：

# feature importance
print(LGBM.feature_importances_)
# plot
plt.bar(range(len(LGBM.feature_importances_)), LGBM.feature_importances_)
plt.show()

How to print the percentage in this model?如何打印此 model 中的百分比？ For some reason I was sure they calculate it automatically.出于某种原因，我确信他们会自动计算。

Answer 1

The percentage option is available in the R version but not in the Python one .百分比选项在 R 版本中可用，但在Python 版本中不可用。 In Python you can do the following (using a made-up example, as I do not have your data):在 Python 中，您可以执行以下操作（使用虚构的示例，因为我没有您的数据）：

from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
from lightgbm import LGBMRegressor
import pandas as pd

X, y = make_regression(n_samples=1000, n_features=10, n_informative=10, random_state=1)
feature_names = [f'Feature {i+1}' for i in range(10)]
X = pd.DataFrame(X, columns=feature_names)

model = LGBMRegressor(importance_type='gain')
model.fit(X, y)

feature_importances = (model.feature_importances_ / sum(model.feature_importances_)) * 100

results = pd.DataFrame({'Features': feature_names,
                        'Importances': feature_importances})
results.sort_values(by='Importances', inplace=True)

ax = plt.barh(results['Features'], results['Importances'])
plt.xlabel('Importance percentages')
plt.show()

Output: Output：

打印功能重要性百分比

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-12-28 22:30:22

打印功能重要性百分比

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-12-28 22:30:22

解决方案1
2 已采纳 2020-12-28 22:30:22