簡體   English   中英

在RandomForestRegressor sklearn中繪制要素重要性

[英]Plot feature importance in RandomForestRegressor sklearn

我是數據科學的新手。 我試圖找出我的數據集的功能重要性排名。 我已經應用了隨機森林並得到了輸出。

這是我的代碼:

# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# importing dataset

dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values

#encoding catagorical data

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

#country
labelencoder_X_1= LabelEncoder()
X[:,1]=labelencoder_X_1.fit_transform(X[:,1])

#gender
labelencoder_X_2= LabelEncoder()
X[:,2]=labelencoder_X_2.fit_transform(X[:,2])

onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()


#spliting dataset into test set and train set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)

from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)  
regressor.fit(X_train, y_train) 

在重要性部分,我幾乎復制了以下示例: https//scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

這是代碼:

#feature importance
from sklearn.ensemble import ExtraTreesClassifier

importances = regressor.feature_importances_
std = np.std([tree.feature_importances_ for tree in regressor.estimators_],
             axis=0)
indices = np.argsort(importances)[::-1]
print("Feature ranking:")

for f in range(X.shape[1]):
    print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

# Plot the feature importances of the forest
plt.figure()
plt.title("Feature importances")
plt.bar(range(X.shape[1]), importances[indices],
       color="r", yerr=std[indices], align="center")
plt.xticks(range(X.shape[1]), indices)
plt.xlim([-1, X.shape[1]])
plt.show()

我期待文檔中顯示的輸出。 有人可以幫幫我嗎? 提前致謝。

我的數據集在這里: 在此輸入圖像描述

您有很多功能,無法在單個圖中看到。 只是繪制其中一些。

在這里,我繪制了前20個最重要的:

# Plot the feature importances of the forest
plt.figure(figsize=(18,9))
plt.title("Feature importances")
n=20
_ = plt.bar(range(n), importances[indices][:n], color="r", yerr=std[indices][:n])
plt.xticks(range(n), indices)
plt.xlim([-1, n])
plt.show()

我的代碼,以防您需要它: https//filebin.net/be4h27swglqf3ci3

輸出:

在此輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM