简体   繁体   English

Scikit-学习predict_proba的RandomForestClassifier输出

[英]Scikit-learn RandomForestClassifier output of predict_proba

I have a dataset that I split in two for training and testing a random forest classifier with scikit learn . 我有一个数据集,我分成两个用于训练和测试随机森林分类器与scikit学习

I have 87 classes and 344 samples. 我有87个班级和344个样本。 The output of predict_proba is, most of the times, a 3-dimensional array (87, 344, 2) (it's actually a list of 87 numpy.ndarray s of (344, 2) elements). 的输出predict_proba是,大部分的时间,一个3维阵列(87, 344, 2)它实际上是一个list的87 numpy.ndarray(344, 2)的元素)。

Sometimes, when I pick a different subset of samples for training and testing, I only get a 2-dimensional array (87, 344) (though I can't work out in which cases). 有时候,当我选择不同的样本子集进行训练和测试时,我只得到一个二维数组(87, 344) (尽管在哪些情况下我无法解决)。

My two questions are: 我的两个问题是:

  • what do these dimensions represent? 这些尺寸代表什么? I worked out that to get a ROC AUC score, I have to take one half of the output (that is (87, 344, 2)[:,:,1] , transpose it, and then compare it with my ground truth ( roc_auc_score(ground_truth, output_of_predict_proba[:,:,1].T) essentially) . But I don't understand what it really means. 我得出了得到ROC AUC分数,我必须得到输出的一半(即(87, 344, 2)[:,:,1] ,转置它,然后将它与我的基本事实进行比较( roc_auc_score(ground_truth, output_of_predict_proba[:,:,1].T)本质上。但我不明白它的真正含义。
  • why does the output change with different subsets of the data? 为什么输出会随着数据的不同子集而变化? I can't understand in which cases it returns a 3D array and in which cases a 2D one. 我无法理解它在哪种情况下返回3D数组,在哪种情况下返回2D数组。

classifier.predict_proba() returns the class probabilities. classifier.predict_proba()返回类概率。 The n dimension of the array will vary depending on how many classes there are in the subset you train on 数组的n维度将根据您训​​练的子集中有多少个类而有所不同

Are you sure the arrays you're using to fit the RF has the right shape ? 您确定用于适合RF的阵列具有正确的形状吗? (n_samples,n_features) for the data and (n_samples) for the target classes. (n_samples,n_features)用于数据,(n_samples)用于目标类。 You should get an array Y_pred of shape (n_samples,n_classes) so (344,87) in your case, where item i of row r is the predictied probability of the class i for the sample X[r,:]. 在你的情况下你应该得到一个数组Y_pred的形状(n_samples,n_classes)so(344,87),其中行r的项目i是样本X [r,:]的类i的预测概率。 Note that sum( Y_pred[r,:] ) = 1 . 注意sum( Y_pred[r,:] ) = 1

However I think if your target array Y has shape (n_samples,n_classes), where each row would be all zeros except one corresponding to the class of the sample, then sklearn take it as a multi-output prediction problem (consider each class independently) but I don't think that's what you'd like to do. 但是我想如果你的目标数组Y有形状(n_samples,n_classes),其中每一行都是零,除了一个对应于样本类的一行,那么sklearn将它作为一个多输出预测问题(单独考虑每个类)但我不认为这是你想做的。 In that case, for each class and each sample, you would predict the probability of belonging to this class or not. 在这种情况下,对于每个类和每个样本,您将预测属于该类的概率。

Finally the output indeed depend on the training set because it depends on the number of classes (in the training set). 最后,输出确实取决于训练集,因为它取决于类的数量(在训练集中)。 You can get it with the attribute n_classes (and you may also be able to force the number of classes by setting it manually) and you can also get the classes' value with the attribute classes . 您可以使用属性n_classes获取它(并且您也可以通过手动设置强制类的数量),并且还可以使用属性classes获取类的值。 See the documentation . 请参阅文档

Hope it helps ! 希望能帮助到你 !

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 scikit-learn中的predict_proba输出 - Output of predict_proba in scikit-learn scikit-learn 中 predict 与 predict_proba 之间的差异 - Difference between predict vs predict_proba in scikit-learn 混淆scikit-learn svm的predict_proba的概率 - Confusing probabilities of the predict_proba of scikit-learn's svm Scikit-learn predict_proba给出了错误的答案 - Scikit-learn predict_proba gives wrong answers Scikit了解输出**predict_proba**和**predict.**的含义 - Scikit Learn the meaning of output **predict_proba** and **predict.** 如何在scikit-learn中获取与predict_proba一起使用的cross_val_predict中的类标签 - How to get classes labels from cross_val_predict used with predict_proba in scikit-learn Google Cloud ML引擎scikit学习预测概率'predict_proba()' - Google Cloud ML-engine scikit-learn prediction probability 'predict_proba()' 如何在MLlib中实现Scikit-Learn的predict_proba(X)等效项 - How to implement the predict_proba(X) -equivalent of Scikit-Learn in MLlib 如何从 scikit-learn predict_proba 中恢复输入分类符号? - How to recover input categorical symbols from scikit-learn predict_proba? 如何知道Scikit-learn中的predict_proba在返回数组中表示的类 - How to know what classes are represented in return array from predict_proba in Scikit-learn
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM