[英]Sklearn - How to predict probability for all target labels
I have a data set with a target variable that can have 7 different labels.我有一个数据集,其中的目标变量可以有 7 个不同的标签。 Each sample in my training set has only one label for the target variable.我的训练集中的每个样本只有一个目标变量的标签。
For each sample, I want to calculate the probability for each of the target labels.对于每个样本,我想计算每个目标标签的概率。 So my prediction would consist of 7 probabilities for each row.所以我的预测将由每行的 7 个概率组成。
On the sklearn website I read about multi-label classification, but this doesn't seem to be what I want.在 sklearn 网站上,我阅读了有关多标签分类的内容,但这似乎不是我想要的。
I tried the following code, but this only gives me one classification per sample.我尝试了以下代码,但这只能为每个样本提供一个分类。
from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
Does anyone have some advice on this?有没有人对此有一些建议? Thanks!谢谢!
You can do that by simply removing the OneVsRestClassifer
and using predict_proba
method of the DecisionTreeClassifier
.您可以通过简单地移除做OneVsRestClassifer
和使用predict_proba
的方法DecisionTreeClassifier
。 You can do the following:您可以执行以下操作:
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)
This will give you a probability for each of your 7 possible classes.这将为您提供 7 个可能类别中的每一个的概率。
Hope that helps!希望有帮助!
You can try using scikit-multilearn - an extension of sklearn that handles multilabel classification.您可以尝试使用scikit- multilearn - 处理多标签分类的 sklearn 的扩展。 If your labels are not overly correlated you can train one classifier per label and get all predictions - try (after pip install scikit-multilearn ):如果您的标签不是过度相关,您可以为每个标签训练一个分类器并获得所有预测 - 尝试(在pip install scikit-multilearn 之后):
from skmultilearn.problem_transform import BinaryRelevance
classifier = BinaryRelevance(classifier = DecisionTreeClassifier())
# train
classifier.fit(X_train, y_train)
# predict
predictions = classifier.predict(X_test)
Predictions will contain a sparse matrix of size (n_samples, n_labels) in your case - n_labels = 7, each column contains prediction per label for all samples.在您的情况下,预测将包含大小为 (n_samples, n_labels) 的稀疏矩阵 - n_labels = 7,每列包含所有样本的每个标签的预测。
In case your labels are correlated you might need more sophisticated methods for multi-label classification.如果您的标签是相关的,您可能需要更复杂的多标签分类方法。
Disclaimer: I'm the author of scikit-multilearn, feel free to ask more questions.免责声明:我是 scikit-multilearn 的作者,请随时提出更多问题。
If you insist on using the OneVsRestClassifer
, then you could also call predict_proba(X_test)
as it is supported by OneVsRestClassifer
as well.如果您坚持使用OneVsRestClassifer
,那么你也可以调用predict_proba(X_test)
因为它是由支持OneVsRestClassifer
为好。
For eg:例如:
from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)
The order of the labels for which you get the result can be found in:您可以在以下位置找到获得结果的标签顺序:
clf.classes_
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.