[英]'numpy.ndarray' object has no attribute 'columns'
我試圖找出隨機森林分類任務的特征重要性。 但它給了我以下錯誤:
'numpy.ndarray' 對象沒有屬性 'columns'
這是我的代碼的一部分:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# importing dataset
dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values
#spliting dataset into test set and train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
#feature importance
feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)
我希望這應該為我的數據集的每一列提供特征重要性分數。 (注:原始數據為CSV格式)
所以X_train
從出來train_test_split
實際上是一個numpy的陣列,這將永遠不會有一個列。 其次,當你從dataset
創建X
時,你要求的值是返回numpy.ndarry而不是df。
你需要改變你的路線
feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)
至
columns_ = dataset.iloc[:1, 3:12].columns
feature_importances = pd.DataFrame(rf.feature_importances_,index = columns_,columns=['importance']).sort_values('importance',ascending=False)
用這個:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# importing dataset
dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values
#spliting dataset into test set and train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
#feature importance
feature_importances = pd.DataFrame(regressor.feature_importances_,index = dataset.columns,columns=['importance']).sort_values('importance',ascending=False)
iloc 和 loc 函數只能應用於 Pandas 數據幀。 您正在將它們應用於數組。 解決方案:將數組轉換為數據幀,然后應用 iloc 或 loc
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.