I am using Random Forest (from sklearn) for a multi-classification problem, with ordered classes(say 0,...,n, with n=4 in my specific case) roughly equally distributed. I have many observations (roughly 5000) and I split them in train/test 70%/30% respectively - the classes are equally distributed also in train and test. I set random_state=None
, so each time I re-run the fitting of the model (on the same training set) and then the prediction, I obtain slightly different results on my test set.
My question is how to measure if Random Forest is working well by comparing different predictions...
For example if I obtain as predictions first only 0 and then only n (where, as said, 0 and n are the most different classes), I would say that the RF is not working at all. On the contrary if only few predictions change from a class to a close one (eg first 0 and then 1), I would say RF is working well.
Is there a specific command to check this automatically?
I think for this type of investigation we do not care whether the classifier made the right prediction, but we want to know whether it made stable==consistent predictions.
Assume repeated_prediction
has shape: [repetitions,samples] and contains the predictions for each sample 1...n with multiple repetitions
What about:
np.mean(np.std(repeated_predictions,axis=0))
There are also papers that analyze the consistency of Random Forest's eg Consistency of Random Forests and Other Averaging Classifiers but it seems to be a though read.
One solution is use cross validation. With this you will obtain a robust measure of general accuracy of the model.
Then you will train and test n different models (check this link , it is pretty well explained). You can calculate the accuracy of each model, and then obtain the mean of these measures. And example would be (with 5 splits):
scores = cross_val_score(clf, X, y, cv=5)
And then plot the mean and std deviation of all of these accuracies:
print("%0.2f accuracy with a standard deviation of %0.2f" % (scores.mean(), scores.std()))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.