简体   繁体   中英

How to use pandas to create a crosstab to show the prediction result of random forest predictor?

I'm a newbie to the random forest (as well as python). I'm using random forest classifier, the dataset is defined 't2002'.

 t2002.column 

So here are the columns:

Index(['IndividualID', 'ES2000_B01ID', 'NSSec_B03ID', 'Vehicle', 
   'Age_B01ID',
   'IndIncome2002_B02ID', 'MarStat_B01ID', 'EcoStat_B03ID',
   'MainMode_B03ID', 'TripStart_B02ID', 'TripEnd_B02ID',
   'TripDisIncSW_B01ID', 'TripTotalTime_B01ID', 'TripTravTime_B01ID',
   'TripPurpFrom_B01ID', 'TripPurpTo_B01ID'],
  dtype='object')

I'm using codes as below to run the classifier:

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import make_scorer, accuracy_score
from sklearn.model_selection import GridSearchCV

from sklearn.model_selection import train_test_split
X_all = t2002.drop(['MainMode_B03ID'],axis=1)
y_all = t2002['MainMode_B03ID']
p = 0.2

X_train,X_test, y_train, y_test = train_test_split(X_all,y_all,test_size=p, 
random_state=23)

clf = RandomForestClassifier()
acc_scorer = make_scorer(accuracy_score)

 parameters = {
         }    # parameter is blank

grid_obj = GridSearchCV(clf,parameters,scoring=acc_scorer)
grid_obj = grid_obj.fit(X_train,y_train)

clf = grid_obj.best_estimator_
clf.fit(X_train,y_train)

predictions = clf.predict(X_test)
print(accuracy_score(y_test,predictions))

In this case, how could I use pandas to generate a crosstab (like a table) to show the detailed prediction results?

Thanks in advance!

you can first create a confusion matrix using sklearn and then convert it to pandas data frame.

from sklearn.metrics import confusion_matrix
#creating confusion matrix as array
confusion = confusion_matrix(t2002['MainMode_B03ID'].tolist(),predictions)

#converting to df
new_df = pd.DataFrame(confusion,
                 index = t2002['MainMode_B03ID'].unique(),
                 columns = t2002['MainMode_B03ID'].unique())

Its easy to show all the predicted results using pandas. Use cv_results_ as described in docs .

import pandas as pd

results = pd.DataFrame(clf.cv_results_) # clf is the GridSearchCV object
print(results.head()) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM