简体   繁体   English

Sklearn - 如何预测所有目标标签的概率

[英]Sklearn - How to predict probability for all target labels

I have a data set with a target variable that can have 7 different labels.我有一个数据集,其中的目标变量可以有 7 个不同的标签。 Each sample in my training set has only one label for the target variable.我的训练集中的每个样本只有一个目标变量的标签。

For each sample, I want to calculate the probability for each of the target labels.对于每个样本,我想计算每个目标标签的概率。 So my prediction would consist of 7 probabilities for each row.所以我的预测将由每行的 7 个概率组成。

On the sklearn website I read about multi-label classification, but this doesn't seem to be what I want.在 sklearn 网站上,我阅读了有关多标签分类的内容,但这似乎不是我想要的。

I tried the following code, but this only gives me one classification per sample.我尝试了以下代码,但这只能为每个样本提供一个分类。

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict(X_test)

Does anyone have some advice on this?有没有人对此有一些建议? Thanks!谢谢!

You can do that by simply removing the OneVsRestClassifer and using predict_proba method of the DecisionTreeClassifier .您可以通过简单地移除做OneVsRestClassifer和使用predict_proba的方法DecisionTreeClassifier You can do the following:您可以执行以下操作:

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)

This will give you a probability for each of your 7 possible classes.这将为您提供 7 个可能类别中的每一个的概率。

Hope that helps!希望有帮助!

You can try using scikit-multilearn - an extension of sklearn that handles multilabel classification.您可以尝试使用scikit- multilearn - 处理多标签分类的 sklearn 的扩展。 If your labels are not overly correlated you can train one classifier per label and get all predictions - try (after pip install scikit-multilearn ):如果您的标签不是过度相关,您可以为每个标签训练一个分类器并获得所有预测 - 尝试(在pip install scikit-multilearn 之后):

from skmultilearn.problem_transform import BinaryRelevance    
classifier = BinaryRelevance(classifier = DecisionTreeClassifier())

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

Predictions will contain a sparse matrix of size (n_samples, n_labels) in your case - n_labels = 7, each column contains prediction per label for all samples.在您的情况下,预测将包含大小为 (n_samples, n_labels) 的稀疏矩阵 - n_labels = 7,每列包含所有样本的每个标签的预测。

In case your labels are correlated you might need more sophisticated methods for multi-label classification.如果您的标签是相关的,您可能需要更复杂的多标签分类方法。

Disclaimer: I'm the author of scikit-multilearn, feel free to ask more questions.免责声明:我是 scikit-multilearn 的作者,请随时提出更多问题。

If you insist on using the OneVsRestClassifer , then you could also call predict_proba(X_test) as it is supported by OneVsRestClassifer as well.如果您坚持使用OneVsRestClassifer ,那么你也可以调用predict_proba(X_test)因为它是由支持OneVsRestClassifer为好。

For eg:例如:

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)

The order of the labels for which you get the result can be found in:您可以在以下位置找到获得结果的标签顺序:

clf.classes_

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 sklearn 的交叉验证 (Kfold) 预测标签 - How to predict labels using cross-validation (Kfold) with sklearn sklearn LogisticRegression.predict中丢失的概率 - probability missing in sklearn LogisticRegression.predict 如何在 python 中使用 sklearn 回归器正确预测目标变量? - How to correctly predict target variables with sklearn regressor in python? 如何从python中的sklearn中的cross_val_predict获取排序的概率和名称 - How to get the sorted probability and name from cross_val_predict in sklearn in python sklearn Forecast_proba没有Macthing类标签 - sklearn predict_proba not macthing class labels 如何让模型预测概率 - How to make a model predict probability 如何在 sklearn RandomForestRegressor 中正确预测? - How to predict correctly in sklearn RandomForestRegressor? 将多标签的概率二进制值转换为目标标签 - Convert probability binary values of multi labels to target labels 如何确保来自特定组的所有样本都在 sklearn cross_val_predict 中的训练/测试中? - How to ensure all samples from specific group are all togehter in train/test in sklearn cross_val_predict? 您如何在不为每个类别建立分类器的情况下获得所有类别的预测概率? - How do you get a probability of all classes to predict without building a classifier for each single class?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM