简体   繁体   English

我可以在 scikit-learn 中获得错误预测列表吗?

[英]Can I get a list of wrong predictions in scikit-learn?

We can use svm.SVC.score() to evaluate the accuracy of the SVM model.我们可以使用svm.SVC.score()来评估 SVM 模型的准确性。 I want to get the predicted class and the actual class in case of wrong predictions.我想在预测错误的情况下获得预测的类别和实际的类别。 How can I achieve this in scikit-learn ?我怎样才能在scikit-learn中做到这一点?

The simplest approach is just to iterate over your predictions (and correct classifications) and do whatever you want with the output (in the following example I will just print it to stdout).最简单的方法就是迭代你的预测(和正确的分类)并对输出做任何你想做的事情(在下面的例子中我将把它打印到标准输出)。

Lets assume that your data is in inputs, labels, and your trained SVM is in clf, then you can just do让我们假设您的数据在输入、标签中,而您训练有素的 SVM 在 clf 中,那么您可以这样做

predictions = clf.predict(inputs)
for input, prediction, label in zip(inputs, predictions, labels):
  if prediction != label:
    print(input, 'has been classified as ', prediction, 'and should be ', label) 

It depends on what form you want the incorrect predictions to be in. For most use cases a confusion matrix should be sufficient.这取决于您希望错误预测的形式。对于大多数用例,混淆矩阵应该足够了。

A confusion matrix is a plot of the actual class vs the predicted class, such that the diagonal of the graph is all of the correct predictions, and the remaining cells are the incorrect predictions.混淆矩阵是实际类别与预测类别的关系图,因此图形的对角线是所有正确的预测,其余单元格是错误的预测。

混淆矩阵

You can see a better example of a confusion matrix on sklearn's Confusion Matrix example .您可以在 sklearn 的Confusion Matrix example上看到一个更好的混淆矩阵示例。

If you just want a list of all of the misclassified values with their predicted and actual classes, you can do something like the following.如果您只想要一个包含所有错误分类值及其预测类别和实际类别的列表,您可以执行如下操作。

Just select all of the rows of data where the actual and predicted classes are not equal.只需选择实际类别和预测类别不相等的所有数据行。

import numpy as np
import pandas as pd

X = np.array([0.1, 0.34, 0.2, 0.98])
y = np.array(["A", "B", "A", "C"])

y_pred = np.array(["A", "C", "B", "C"])

df = pd.DataFrame(X, columns=["X"])
df["actual"] = y
df["predicted"] = y_pred

incorrect = df[df["actual"] != df["predicted"]]

In this case incorrect would contain the following entries.在这种情况下, incorrect的将包含以下条目。

      X actual predicted
1  0.34      B         C
2  0.20      A         B

You can directly make a confusion matrix using sklearn.您可以使用 sklearn 直接制作混淆矩阵。 It gives a (2*2) matrix.它给出了一个 (2*2) 矩阵。

from sklearn import metrics

my_matrix = metrics.confusion_matrix(Y_test, Y_predicted)

Y_test: Array of your test class Y_test:你的测试类数组

Y_predicted: Array of predictions by your model Y_predicted:模型的预测数组

The cells of the confusion matrix will give you: True positive values, False Positive values, False Negative values and True Negative values.混淆矩阵的单元格将为您提供:真阳性值、假阳性值、假阴性值和真阴性值。

Please have a look at this .请看看这个

I used some methods listed above.我使用了上面列出的一些方法。 but today found something more simply.但今天发现了更简单的东西。 Try it.尝试一下。 If your data has 2 features, you can use it.如果你的数据有 2 个特征,你可以使用它。

X-Data X数据

y- your predictions y- 你的预测

false_x = X[y==0][:, 0]

True_x  = X[y==1][:, 0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM