简体   繁体   中英

How to measure Random Forest classifier accuracy?

So, I am using a Random Forest classifier to make predictions using this code:

# Import Random Forest
from sklearn.ensemble import RandomForestClassifier

# Create a Gaussian Classifier
clf_two=RandomForestClassifier(n_estimators=3)

# Train the model using the training sets
clf_two.fit(emb_train, ytrain.ravel())

y_pred_two=clf_two.predict(emb_test)

I want to find out the accuracy of my classifier and tried doing this:

# Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

# Model Accuracy
print("Accuracy:", metrics.accuracy_score(ytrain, y_pred_two))

The problem is that y_pred_two is a row vector of size (5989,) and ytrain is a column vector of size (16128, 1) . So there is a size mismatch between the two and I am getting this error:

ValueError: Found input variables with inconsistent numbers of samples: [16128, 5989]

Is it still possible to measure the accuracy if the sizes for y_pred_two and ytrain are different or am I doing something wrong? But that's how training and testing data was given to me.

Your quick help would be greatly appreciated!

It seems to me that the issue is simply that you are trying to evaluate the accuracy of predicted values obtained by running the model on test samples with target labels of the train dataset.

You just need to load or generate the test set labels (ytest) and run:

print("Accuracy:", metrics.accuracy_score(ytest, y_pred_two))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM