简体   繁体   English

Scikit的平均精度得分输入形状不良

[英]Scikit's Average Precision Score bad input shape

I'm trying to plot a precision/recall score curve. 我正在尝试绘制精度/召回力得分曲线。 This is the code I have: 这是我的代码:

    lbl_enc = preprocessing.LabelEncoder()
    labels = lbl_enc.fit_transform(test_tags)

    y_score = clf.predict_proba(test_set)

    average_precision = average_precision_score(labels, y_score)
    print('Average precision-recall score: {0:0.2f}'.format(average_precision))

    precision, recall, _ = precision_recall_curve(labels, y_score)

    plt.step(recall, precision, color='b', alpha=0.2,
             where='post')
    plt.fill_between(recall, precision, step='post', alpha=0.2,
                     color='b')

    plt.xlabel('Recall')
    plt.ylabel('Precision')
    plt.ylim([0.0, 1.05])
    plt.xlim([0.0, 1.0])
    plt.title('2-class Precision-Recall curve: Average P-R = {0:0.2f}'.format(
        average_precision))

At the point I'm calculating the average_precision_score, I get this "ValueError: bad input shape (119, 2)" that is caused by the "y_score" variable. 在计算average_precision_score的时候,我得到了由“ y_score”变量引起的“ ValueError:错误的输入形状(119,2)”。

y_score is in this format: y_score的格式如下:

array([[0.45953712, 0.54046288],
   [0.78289908, 0.21710092],
   [0.13488789, 0.86511211],
   [0.56162583, 0.43837417],
   (...)
   [0.4595595 , 0.5404405 ]])

while labels is in this: 标签在其中:

array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1, 1])

How can I make this work for calculating avg precision score? 如何进行这项工作来计算平均精度得分? Thanks in advance. 提前致谢。

In the documentation , it says: 文档中 ,它说:

y_score : array, shape = [n_samples] or [n_samples, n_classes] y_score:数组,形状= [n_samples]或[n_samples,n_classes]

Target scores, can either be probability estimates of the positive class , confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers). 目标分数可以是肯定类别的概率估计值,置信度值或决策的非阈值度量(如某些分类器上的“ decision_function”所返回)。

Therefore I believe you just need to do: 因此,我相信您只需要做:

average_precision  = average_precision_score(labels, y_score[:,1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM