简体繁体 English

使用RandomForestClassifier.predict_proba与RandomForestRegressor.predict

[英]using RandomForestClassifier.predict_proba vs RandomForestRegressor.predict

原文 2013-11-24 18:36:56 9 1 python/ scikit-learn

I have a data set comprising a vector of features, and a target - either 1.0 or 0.0 (representing two classes). 我有一个数据集，包括一个特征向量和一个目标 - 1.0或0.0（代表两个类）。 If I fit a RandomForestRegressor and call its predict function, is it equivalent to using RandomForestClassifier.predict_proba() ? 如果我适合RandomForestRegressor并调用其predict函数，它是否等同于使用RandomForestClassifier.predict_proba() ？

In other words if the target is 1.0 or 0.0 does RandomForestRegressor output probabilities? 换句话说，如果目标是1.0或0.0， RandomForestRegressor输出概率？

I think so, and the results I am getting suggest so, but I would like to get a second opinion... 我是这么认为的，我得到的结果也是如此，但我想得到第二个意见......

Thanks Weasel 谢谢Weasel

1 个解决方案

There is a major conceptual diffrence between those, based on different tasks being addressed: 基于所处理的不同任务，这些之间存在重大的概念差异：

Regression : continuous (real-valued) target variable. 回归：连续（实值）目标变量。

Classification : discrete target variable (classes). 分类：离散目标变量（类）。

For a general classification method, term probability of observation being class X may be not defined, as some classification methods, knn for example, do not deal with probabilities. 对于一般的分类方法，术语probability of observation being class X可以没有定义，如一些分类方法， knn例如，不处理的概率。

However for Random Forest (and some other classification methods), classification is reduced to regression of classes probabilities destibution. 然而，对于随机森林（以及一些其他分类方法），分类被简化为类概率分解的回归。 Predicted class is taked then as argmax of computed "probabilities". 然后将预测类作为计算“概率”的argmax。 In your case, you feed the same input, you get the same result. 在您的情况下，您输入相同的输入，您得到相同的结果。 And yes, it is ok to treat values returned by RandomForestRegressor as probabilities. 是的，可以将RandomForestRegressor返回的RandomForestRegressor视为概率。