Sklearn - 如何预测所有目标标签的概率

Question

I have a data set with a target variable that can have 7 different labels.我有一个数据集，其中的目标变量可以有 7 个不同的标签。 Each sample in my training set has only one label for the target variable.我的训练集中的每个样本只有一个目标变量的标签。

For each sample, I want to calculate the probability for each of the target labels.对于每个样本，我想计算每个目标标签的概率。 So my prediction would consist of 7 probabilities for each row.所以我的预测将由每行的 7 个概率组成。

On the sklearn website I read about multi-label classification, but this doesn't seem to be what I want.在 sklearn 网站上，我阅读了有关多标签分类的内容，但这似乎不是我想要的。

I tried the following code, but this only gives me one classification per sample.我尝试了以下代码，但这只能为每个样本提供一个分类。

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict(X_test)

Does anyone have some advice on this?有没有人对此有一些建议？ Thanks!谢谢！

Answer 1

You can do that by simply removing the OneVsRestClassifer and using predict_proba method of the DecisionTreeClassifier .您可以通过简单地移除做OneVsRestClassifer和使用predict_proba的方法DecisionTreeClassifier 。 You can do the following:您可以执行以下操作：

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)

This will give you a probability for each of your 7 possible classes.这将为您提供 7 个可能类别中的每一个的概率。

Hope that helps!希望有帮助！

Answer 2

You can try using scikit-multilearn - an extension of sklearn that handles multilabel classification.您可以尝试使用scikit- multilearn - 处理多标签分类的 sklearn 的扩展。 If your labels are not overly correlated you can train one classifier per label and get all predictions - try (after pip install scikit-multilearn ):如果您的标签不是过度相关，您可以为每个标签训练一个分类器并获得所有预测 - 尝试（在pip install scikit-multilearn 之后）：

from skmultilearn.problem_transform import BinaryRelevance    
classifier = BinaryRelevance(classifier = DecisionTreeClassifier())

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

Predictions will contain a sparse matrix of size (n_samples, n_labels) in your case - n_labels = 7, each column contains prediction per label for all samples.在您的情况下，预测将包含大小为 (n_samples, n_labels) 的稀疏矩阵 - n_labels = 7，每列包含所有样本的每个标签的预测。

In case your labels are correlated you might need more sophisticated methods for multi-label classification.如果您的标签是相关的，您可能需要更复杂的多标签分类方法。

Disclaimer: I'm the author of scikit-multilearn, feel free to ask more questions.免责声明：我是 scikit-multilearn 的作者，请随时提出更多问题。

Answer 3

If you insist on using the OneVsRestClassifer , then you could also call predict_proba(X_test) as it is supported by OneVsRestClassifer as well.如果您坚持使用OneVsRestClassifer ，那么你也可以调用predict_proba(X_test)因为它是由支持OneVsRestClassifer为好。

For eg:例如：

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)

The order of the labels for which you get the result can be found in:您可以在以下位置找到获得结果的标签顺序：

clf.classes_

Sklearn - 如何预测所有目标标签的概率

问题描述

3 个解决方案

解决方案1
15 已采纳 2016-07-16 09:05:00

解决方案2
3 2016-07-16 17:23:44

解决方案3
1 2020-04-30 17:39:10

Sklearn - 如何预测所有目标标签的概率

问题描述

3 个解决方案

解决方案1 15 已采纳 2016-07-16 09:05:00

解决方案2 3 2016-07-16 17:23:44

解决方案3 1 2020-04-30 17:39:10

解决方案1
15 已采纳 2016-07-16 09:05:00

解决方案2
3 2016-07-16 17:23:44

解决方案3
1 2020-04-30 17:39:10