简体   繁体   中英

How do you get a probability of all classes to predict without building a classifier for each single class?

Given a classification problem, sometimes we do not just predict a class, but need to return the probability that it is a class.

ie P(y=0|x), P(y=1|x), P(y=2|x), ..., P(y=C|x)

Without building a new classifier to predict y=0, y=1, y=2... y=C respectively. Since training C classifiers (let's say C=100) can be quite slow.

What can be done to do this? What classifiers naturally can give all probabilities easily (one I know is using neural network with 100 out nodes)? But if I use traditional random forests, I can't do that, right? I use the Python Scikit-Learn library.

If you want probabilities, look for sklearn-classifiers that have method: predict_proba()

Sklearn documentation about multiclass:[ http://scikit-learn.org/stable/modules/multiclass.html]

All scikit-learn classifiers are capable of multiclass classification. So you don't need to build 100 models yourself.

Below is a summary of the classifiers supported by scikit-learn grouped by strategy:

  • Inherently multiclass: Naive Bayes, LDA and QDA, Decision Trees, Random Forests, Nearest Neighbors, setting multi_class='multinomial' in sklearn.linear_model.LogisticRegression.
  • Support multilabel: Decision Trees, Random Forests, Nearest Neighbors, Ridge Regression.
  • One-Vs-One: sklearn.svm.SVC.
  • One-Vs-All: all linear models exceptsklearn.svm.SVC.

Random forests do indeed give P(Y/x) for multiple classes. In most cases P(Y/x) can be taken as:

P(Y/x)= the number of trees which vote for the class/Total Number of trees.

However you can play around with this, for example in one case if the highest class has 260 votes, 2nd class 230 votes and other 5 classes 10 votes, and in another case class 1 has 260 votes, and other classes have 40 votes each, you migth feel more confident in your prediction in 2nd case as compared to 1st case, so you come up with a confidence metric according to your use case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM