简体繁体中英

sklearn calibrated classifier with random forest

原文 2018-05-13 12:41:28 5 1 python/ scikit-learn/ random-forest

Scikit has a very useful classifier wrappers called CalibratedClassifer and CalibratedClassifierCV , which try to make sure that the predict_proba function of a classifier really predicts a probability and not just an arbitrary number (albeit perhaps well-ranked) between zero and one.

However, when using random forests it is customary to use oob_decision_function_ to determine the performance on the training data, but this is no longer available when using the the calibrated models. The calibration should therefore work well for new data but not for the training data. How can we evaluate performance on the training data to determine, eg, overfitting?

1 answers

Apparently there really was no solution to this, and so I made a pull request to scikit-learn.

The problem was that the out-of-bag predictions are created during learning. Therefore, in the CalibratedClassifierCV each of the sub-classifiers does have its own oob decision function. However, this decision function is calculated on a fold of the data. Therefore, it is necessary to store each oob prediction (keeping nan values for samples that are not in the fold), then convert all the predictions using the calibration transformation, and then average the calibrated oob predictions to create an updated oob prediction.

As mentioned, I created a pull request at https://github.com/scikit-learn/scikit-learn/pull/11175 . It will probably be a while before it is merged into the package, though, so if anyone really needs to use it then feel free to use my fork of scikit-learn at https://github.com/yishaishimoni/scikit-learn .

strange behavior of sklearn random forest classifier

Sklearn - Cannot use encoded data in Random forest classifier

Can sklearn random forest classifier handle categorical variables?

random forest classifier visualization

Tuning Random Forest classifier

Random forest sklearn

How to use trained model for another dataset in Sklearn's Random Forest Classifier?

SelectFromModel from sklearn gives significantly different features on random forest and gradient boosting classifier

Can sklearn Random Forest classifier adjust sample size by tree, to handle class imbalance?

Eli5 explain_weights does not returns feature_importance for each class with sklearn Random Forest classifier

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question strange behavior of sklearn random forest classifier Sklearn - Cannot use encoded data in Random forest classifier Can sklearn random forest classifier handle categorical variables? random forest classifier visualization Tuning Random Forest classifier Random forest sklearn How to use trained model for another dataset in Sklearn's Random Forest Classifier? SelectFromModel from sklearn gives significantly different features on random forest and gradient boosting classifier Can sklearn Random Forest classifier adjust sample size by tree, to handle class imbalance? Eli5 explain_weights does not returns feature_importance for each class with sklearn Random Forest classifier

Related Tags

sklearn calibrated classifier with random forest

Question

1 answers

solution1 0 ACCPTED 2018-07-10 06:29:39

solution1
0 ACCPTED 2018-07-10 06:29:39