简体   繁体   中英

Precision, Recall, F-score requiring equal inputs

I am looking at precision, recall, and f-score using scikit-learn using:

from sklearn.metrics import `precision_score`

Then:

y_true = np.array(["one", "two", "three"])
y_pred = np.array(["one", "two"])

precision = precision_score(y_true, y_pred, average=None)
print(precision)

The error returned is:

ValueError: Found input variables with inconsistent numbers of samples: [3, 2]

Due to the imbalanced input arrays, why does scikit-learn require an equal amount of inputs? Particularly when evaluating recall (which I would have thought was taking more guesses than answers).

I can implement my own metrics or just reduce the arrays so they match. I want to be sure there is no underlying reason why I should not?

It really depends what your y_true and y_pred mean in your case. But generally, y_true will be a vector indicating what the true value is supposed to be for every element of y_pred . I think this is not your case, and to use scikit-learn 's metrics, you would need to put them in that format.

So in the case of binary classification, precision will be:

correct_classifications = (y_true == y_pred).astype(int)
precision = sum(y_pred * correct_classifications) / sum(y_pred)

Here you see that you need y_true and y_pred to be the same length.

That is quite simply because sklearn is playing the safe role here.

It doesn't make sense that you didn't do 100% of the predictions for the test set.

Let's say you have 1M data points in your dataset but you only predict 200k, are those the first 200k points? The last? Spread all over? How would the library know which matches which?

You have to have a 1:1 correspondance at the input of the metrics calculation. If you don't have predictions for some points throw them out (but make sure you know why you don't have such predictions in the first place, if it's not a problem with the pipeline) - you don't want to say you have 100% recall at 1% precision and in the end you only predicted for 10% of the dataset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM