简体   繁体   English

Scikit学习返回错误的分类报告和准确性得分

[英]Scikit-learn returning incorrect classification report and accuracy score

I'm training an SVM on 1200 examples of label 2 and 1200 examples of label 1 with an RBF kernel. 我正在使用RBF内核在1200个标签2示例和1200个标签1示例上训练SVM。 I thought I was getting 77% accuracy, and I was getting accuracy using sklearn.metrics.accuracy_score . 我以为我获得了77%的准确度,并且使用sklearn.metrics.accuracy_score获得了准确度。 But when I hand-rolled my own precision score, like so: 但是当我手动计算自己的精度得分时,如下所示:

def naive_accuracy(true, pred):
    number_correct = 0
    i = 0
    for y in true:
        if pred[i] == y:
            number_correct += 1.0
    return number_correct / len(true)

It got 50%. 它得到了50%。 I believe I've wasted weeks of work based on a false accuracy score and classification report. 我相信我由于虚假的准确性得分和分类报告而浪费了数周的工作。 Can anyone supply me with an explanation for why this has happened? 谁能为我提供原因解释? I'm very, very confused as to how this could have happened. 对于这是怎么发生的,我感到非常困惑。 I don't see what I'm doing wrong. 我看不到我在做什么错。 And when I tested the metrics.accuracy_score function on some dummy data like pred = [1, 1, 2, 2]; 当我在某些虚拟数据(例如pred = [1, 1, 2, 2]; 1,1,2,2])上测试metrics.accuracy_score函数时pred = [1, 1, 2, 2]; test = [1, 2, 1, 2] , and it gave me 50% like you'd expect. test = [1, 2, 1, 2] ,它给了我50%的期望。 I think accuracy_score might be erring due to my specific data somehow. 我以为我的特定数据可能会导致precision_score错误。

I have 27-feature vectors and 1200 vectors of class 1 and 1200 vectors of class 2. My code is the following: 我有27个特征向量和1类的1200个向量以及2类的1200个向量。我的代码如下:

X = scale(np.asarray(X))
y = np.asarray(y)
X_train, X_test, y_train, y_test = train_test_split(X, y)

######## SVM ########
clf = svm.SVC()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
# 77%
print "SVM Accuracy:", accuracy_score(y_test, y_pred) # debugging
# 50%
print "*True* SVM Accuracy:", naive_accuracy(y_test, y_pred) # in-house debugging
# also 77%!
print "Classification report:\n", classification_report(y_test, y_pred) # debugging

Your implementation of naive_score is buggy. 您对naive_score实现存在错误。 You are comparing the first element with all the others ( i is never updated). 您正在将第一个元素与所有其他元素进行比较( i从未更新过)。

I would've just left a comment if not for the test case you've designed, which prevented you from zeroing in on the bug yourself. 如果不是针对您设计的测试用例,我只会发表评论,这将使您无法自行发现bug。

Try running your code with: 尝试使用以下代码运行代码:

pred = list([1, 2, 2, 2]); 
test = list([1, 1, 1, 1])

The accuracy returned will be 1.0 ! 返回的精度为1.0

Also worth noting is the fact that if the classes are uniformly distributed, then the expected accuracy returned by the buggy code can be shown to be 50% on any random test set. 同样值得注意的是,如果这些类是均匀分布的,那么在任何随机测试集上,越野车代码返回的预期精度都可以显示为50%

It is also a good idea to have a test suite with several test cases. 拥有一个包含多个测试用例的测试套件也是一个好主意。 A single test case can rarely test all the possible scenarios in non trivial cases. 在非平凡的情况下,单个测试用例很少能测试所有可能的方案。

Though not really needed, here is what you should do instead: 尽管不是真正需要,但是您应该执行以下操作:

def naive_accuracy(true, pred):
    number_correct = 0
    i = 0
    for i, y in enumerate(true):
        if pred[i] == y:
            number_correct += 1.0
    return number_correct / len(true)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 scikit学习中的precision_score与Keras中的准确性之间的差异 - Difference between accuracy_score in scikit-learn and accuracy in Keras Scikit-Learn准确性分数未显示准确性 - Scikit-Learn accuracy score does not show accuracy scikit-learn分类报告-精度和F分数定义不正确,并设置为0.0 - scikit-learn classification report - Precision and F-score are ill-defined and being set to 0.0 使用Scikit-Learn进行多元分类和回归模型的准确性 - Accuracy of multivariate classification and regression models with Scikit-Learn 提高多标签分类的准确性(Scikit-learn、Keras) - Improve the accuracy for multi-label classification (Scikit-learn, Keras) scikit-learn中的oob_score_参数等于精度还是误差? - Parameter oob_score_ in scikit-learn equals accuracy or error? scikit-learn roc_auc_score()返回精度值 - scikit-learn roc_auc_score() returns accuracy values 用scikit理解accuracy_score - 用我自己的语料库学习? - Understanding accuracy_score with scikit-learn with my own corpus? 使用 Scikit-Learn 的回归模型中的负准确度得分 - Negative accuracy score in regression models with Scikit-Learn scikit-learn确定所选类别的分类器的分类/分数 - scikit-learn get certainty of classification / score of the classifier for the chosen category
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM