Scikit学习返回错误的分类报告和准确性得分

Question

I'm training an SVM on 1200 examples of label 2 and 1200 examples of label 1 with an RBF kernel. 我正在使用RBF内核在1200个标签2示例和1200个标签1示例上训练SVM。 I thought I was getting 77% accuracy, and I was getting accuracy using sklearn.metrics.accuracy_score . 我以为我获得了77％的准确度，并且使用sklearn.metrics.accuracy_score获得了准确度。 But when I hand-rolled my own precision score, like so: 但是当我手动计算自己的精度得分时，如下所示：

def naive_accuracy(true, pred):
    number_correct = 0
    i = 0
    for y in true:
        if pred[i] == y:
            number_correct += 1.0
    return number_correct / len(true)

It got 50%. 它得到了50％。 I believe I've wasted weeks of work based on a false accuracy score and classification report. 我相信我由于虚假的准确性得分和分类报告而浪费了数周的工作。 Can anyone supply me with an explanation for why this has happened? 谁能为我提供原因解释？ I'm very, very confused as to how this could have happened. 对于这是怎么发生的，我感到非常困惑。 I don't see what I'm doing wrong. 我看不到我在做什么错。 And when I tested the metrics.accuracy_score function on some dummy data like pred = [1, 1, 2, 2]; 当我在某些虚拟数据（例如pred = [1, 1, 2, 2]; 1，1，2，2]）上测试metrics.accuracy_score函数时pred = [1, 1, 2, 2]; test = [1, 2, 1, 2] , and it gave me 50% like you'd expect. test = [1, 2, 1, 2] ，它给了我50％的期望。 I think accuracy_score might be erring due to my specific data somehow. 我以为我的特定数据可能会导致precision_score错误。

I have 27-feature vectors and 1200 vectors of class 1 and 1200 vectors of class 2. My code is the following: 我有27个特征向量和1类的1200个向量以及2类的1200个向量。我的代码如下：

X = scale(np.asarray(X))
y = np.asarray(y)
X_train, X_test, y_train, y_test = train_test_split(X, y)

######## SVM ########
clf = svm.SVC()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
# 77%
print "SVM Accuracy:", accuracy_score(y_test, y_pred) # debugging
# 50%
print "*True* SVM Accuracy:", naive_accuracy(y_test, y_pred) # in-house debugging
# also 77%!
print "Classification report:\n", classification_report(y_test, y_pred) # debugging

Answer 1

Your implementation of naive_score is buggy. 您对naive_score实现存在错误。 You are comparing the first element with all the others ( i is never updated). 您正在将第一个元素与所有其他元素进行比较（ i从未更新过）。

I would've just left a comment if not for the test case you've designed, which prevented you from zeroing in on the bug yourself. 如果不是针对您设计的测试用例，我只会发表评论，这将使您无法自行发现bug。

Try running your code with: 尝试使用以下代码运行代码：

pred = list([1, 2, 2, 2]); 
test = list([1, 1, 1, 1])

The accuracy returned will be 1.0 ! 返回的精度为1.0 ！

Also worth noting is the fact that if the classes are uniformly distributed, then the expected accuracy returned by the buggy code can be shown to be 50% on any random test set. 同样值得注意的是，如果这些类是均匀分布的，那么在任何随机测试集上，越野车代码返回的预期精度都可以显示为50% 。

It is also a good idea to have a test suite with several test cases. 拥有一个包含多个测试用例的测试套件也是一个好主意。 A single test case can rarely test all the possible scenarios in non trivial cases. 在非平凡的情况下，单个测试用例很少能测试所有可能的方案。

Though not really needed, here is what you should do instead: 尽管不是真正需要，但是您应该执行以下操作：

def naive_accuracy(true, pred):
    number_correct = 0
    i = 0
    for i, y in enumerate(true):
        if pred[i] == y:
            number_correct += 1.0
    return number_correct / len(true)

Scikit学习返回错误的分类报告和准确性得分

问题描述

1 个解决方案

解决方案1
6 已采纳 2014-09-30 07:38:38

Scikit学习返回错误的分类报告和准确性得分

问题描述

1 个解决方案

解决方案1 6 已采纳 2014-09-30 07:38:38

解决方案1
6 已采纳 2014-09-30 07:38:38