简体   繁体   中英

'numpy.ndarray' object has no attribute 'columns'

I am trying to find out the feature importance for Random Forest Classification Task. But it gives me following error :

'numpy.ndarray' object has no attribute 'columns'

Here is a portion of my code :

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


# importing dataset

dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values

#spliting dataset into test set and train set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)

from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)  
regressor.fit(X_train, y_train) 

#feature importance

feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)


I expect this should give the features importance score for each column of my dataset. (Note: the original data is in CSV formate)

So X_train that comes out from train_test_split is actually a numpy array which will never have a columns. Secondly, you are asking for values when you make X from dataset which returns the numpy.ndarry and not a df.

You need to changes your line

feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)

to

columns_ = dataset.iloc[:1, 3:12].columns

feature_importances = pd.DataFrame(rf.feature_importances_,index = columns_,columns=['importance']).sort_values('importance',ascending=False)

Use this:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


# importing dataset

dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values

#spliting dataset into test set and train set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)

from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)  
regressor.fit(X_train, y_train) 

#feature importance

feature_importances = pd.DataFrame(regressor.feature_importances_,index = dataset.columns,columns=['importance']).sort_values('importance',ascending=False)


The iloc and loc functions could be applied to Pandas dataframe only. You are applying them in to an array. Solution: Convert the array into dataframe then apply the iloc or loc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM