使用 SVM 预测概率

Question

I wrote this code and wanted to obtain probabilities of classification.我写了这段代码，想获得分类的概率。

from sklearn import svm
X = [[0, 0], [10, 10],[20,30],[30,30],[40, 30], [80,60], [80,50]]
y = [0, 1, 2, 3, 4, 5, 6]
clf = svm.SVC() 
clf.probability=True
clf.fit(X, y)
prob = clf.predict_proba([[10, 10]])
print prob

I obtained this output:我得到了这个输出：

[[0.15376986 0.07691205 0.15388546 0.15389275 0.15386348 0.15383004 0.15384636]]

which is very weird because the probability should have been这很奇怪，因为概率应该是

[0 1 0 0 0 0 0 0]

(Observe that the sample for which class has to be predicted is same as 2nd sample) also, probability obtained for that class is the lowest. （注意必须预测类别的样本与第二个样本相同）同样，该类别获得的概率最低。

Answer 1

You should disable probability and use decision_function instead, because there is no guarantee that predict_proba and predict return the same result.您应该禁用probability并使用decision_function代替，因为不能保证predict_proba和predict返回相同的结果。 You can read more about it, here in the documentation .您可以在文档中阅读有关它的更多信息。

clf.predict([[10, 10]]) // returns 1 as expected 

prop = clf.decision_function([[10, 10]]) // returns [[ 4.91666667  6.5         3.91666667  2.91666667  1.91666667  0.91666667
      -0.08333333]]
prediction = np.argmax(prop) // returns 1

Answer 2

EDIT : As pointed out by @TimH, the probablities can be given by clf.decision_function(X) .编辑：正如@TimH 所指出的，概率可以由clf.decision_function(X) 。 The below code is fixed.下面的代码是固定的。 Noting the appointed issue with low probabilities using predict_proba(X) , I think the answer is that according to official doc here , .... Also, it will produce meaningless results on very small datasets.注意到使用predict_proba(X)指定的低概率问题，我认为答案是根据官方文档here ， ...。此外，它会在非常小的数据集上产生毫无意义的结果。

The answer residue in understanding what the resulting probablities of SVMs are.答案是理解 SVM 的结果概率是多少。 In short, you have 7 classes and 7 points in the 2D plane.简而言之，您在 2D 平面中有 7 个类和 7 个点。 What SVMs are trying to do, is to find a linear separator, between each class and each one the others (one-vs-one approach). SVM 试图做的是在每个类之间找到一个线性分隔符（一对一方法）。 Every time only 2 classes are chosen.每次只选择 2 个班级。 What you get is the votes of the classifiers, after normalization .你得到的是归一化后分类器的投票。 See more detailed explanation on multi-class SVMs of libsvm in this post or here (scikit-learn uses libsvm).在这篇文章或这里（scikit-learn 使用 libsvm）查看更多关于libsvm 的多类 SVM 的详细解释。

By slightly modifying your code, we see that indeed the right class is chosen:通过稍微修改您的代码，我们看到确实选择了正确的类：

from sklearn import svm
import matplotlib.pyplot as plt
import numpy as np


X = [[0, 0], [10, 10],[20,30],[30,30],[40, 30], [80,60], [80,50]]
y = [0, 1, 2, 3, 3, 4, 4]
clf = svm.SVC() 
clf.fit(X, y)

x_pred = [[10,10]]
p = np.array(clf.decision_function(x_pred)) # decision is a voting function
prob = np.exp(p)/np.sum(np.exp(p),axis=1, keepdims=True) # softmax after the voting
classes = clf.predict(x_pred)

_ = [print('Sample={}, Prediction={},\n Votes={} \nP={}, '.format(idx,c,v, s)) for idx, (v,s,c) in enumerate(zip(p,prob,classes))]

The corresponding output is对应的输出是

Sample=0, Prediction=0,
Votes=[ 6.5         4.91666667  3.91666667  2.91666667  1.91666667  0.91666667 -0.08333333] 
P=[ 0.75531071  0.15505748  0.05704246  0.02098475  0.00771986  0.00283998  0.00104477], 
Sample=1, Prediction=1,
Votes=[ 4.91666667  6.5         3.91666667  2.91666667  1.91666667  0.91666667 -0.08333333] 
P=[ 0.15505748  0.75531071  0.05704246  0.02098475  0.00771986  0.00283998  0.00104477], 
Sample=2, Prediction=2,
Votes=[ 1.91666667  2.91666667  6.5         4.91666667  3.91666667  0.91666667 -0.08333333] 
P=[ 0.00771986  0.02098475  0.75531071  0.15505748  0.05704246  0.00283998  0.00104477], 
Sample=3, Prediction=3,
Votes=[ 1.91666667  2.91666667  4.91666667  6.5         3.91666667  0.91666667 -0.08333333] 
P=[ 0.00771986  0.02098475  0.15505748  0.75531071  0.05704246  0.00283998  0.00104477], 
Sample=4, Prediction=4,
Votes=[ 1.91666667  2.91666667  3.91666667  4.91666667  6.5         0.91666667 -0.08333333] 
P=[ 0.00771986  0.02098475  0.05704246  0.15505748  0.75531071  0.00283998  0.00104477], 
Sample=5, Prediction=5,
Votes=[ 3.91666667  2.91666667  1.91666667  0.91666667 -0.08333333  6.5  4.91666667] 
P=[ 0.05704246  0.02098475  0.00771986  0.00283998  0.00104477  0.75531071  0.15505748], 
Sample=6, Prediction=6,
Votes=[ 3.91666667  2.91666667  1.91666667  0.91666667 -0.08333333  4.91666667  6.5       ] 
P=[ 0.05704246  0.02098475  0.00771986  0.00283998  0.00104477  0.15505748  0.75531071],

And you can also see decision zones:您还可以看到决策区：

X = np.array(X)
y = np.array(y)
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111)

XX, YY = np.mgrid[0:100:200j, 0:100:200j]
Z = clf.predict(np.c_[XX.ravel(), YY.ravel()])

Z = Z.reshape(XX.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(XX, YY, Z, cmap=plt.cm.Paired)

for idx in range(7):
    ax.scatter(X[idx,0],X[idx,1], color='k')

Answer 3

You can read in the docs that...您可以在文档中阅读...

The SVC method decision_function gives per-class scores for each sample (or a single score per sample in the binary case). SVC 方法 decision_function 为每个样本提供每个类别的分数（或在二元情况下每个样本的单个分数）。 When the constructor option probability is set to True, class membership probability estimates (from the methods predict_proba and predict_log_proba) are enabled.当构造函数选项概率设置为 True 时，启用类成员概率估计（来自方法 predict_proba 和 predict_log_proba）。 In the binary case, the probabilities are calibrated using Platt scaling : logistic regression on the SVM's scores, fit by an additional cross-validation on the training data.在二元情况下，概率使用 Platt scaling 进行校准：SVM 分数的逻辑回归，通过对训练数据进行额外的交叉验证来拟合。 In the multiclass case, this is extended as per Wu et al.在多类情况下，这是根据 Wu 等人的扩展。 (2004). (2004)。

Needless to say, the cross-validation involved in Platt scaling is an expensive operation for large datasets .毋庸置疑，Platt 缩放中涉及的交叉验证对于大型数据集来说是一项昂贵的操作。 In addition, the probability estimates may be inconsistent with the scores , in the sense that the “argmax” of the scores may not be the argmax of the probabilities.此外，概率估计可能与分数不一致，因为分数的“argmax”可能不是概率的 argmax。 (Eg, in binary classification, a sample may be labeled by predict as belonging to a class that has probability <½ according to predict_proba .) Platt's method is also known to have theoretical issues. （例如，在二元分类中，根据 predict_proba ，样本可能被 predict 标记为属于概率 < 1/2 的类别。）众所周知，Platt 的方法也存在理论问题。 If confidence scores are required, but these do not have to be probabilities, then it is advisable to set probability=False and use decision_function instead of predict_proba.如果需要置信度分数，但这些分数不一定是概率，那么建议设置概率=False 并使用decision_function 代替predict_proba。

There are also lots of confusion about this function amongst Stack Overflow users, as you can see in this thread , or this one . Stack Overflow 用户中也有很多关于此功能的混淆，正如您在此线程或此线程中所见。

使用 SVM 预测概率

问题描述

3 个解决方案

解决方案1
6 2018-03-27 08:12:21

解决方案2
6 已采纳 2018-03-27 08:18:58

解决方案3
2 2018-03-27 08:13:08

使用 SVM 预测概率

问题描述

3 个解决方案

解决方案1 6 2018-03-27 08:12:21

解决方案2 6 已采纳 2018-03-27 08:18:58

解决方案3 2 2018-03-27 08:13:08

解决方案1
6 2018-03-27 08:12:21

解决方案2
6 已采纳 2018-03-27 08:18:58

解决方案3
2 2018-03-27 08:13:08