简体   繁体   English

Tensorflow 中多类分类的分类精度和召回率?

[英]Class wise precision and recall for multi class classification in Tensorflow?

Is there a way to get per class precision or recall when doing multiclass classification using tensor flow.在使用张量流进行多类分类时,有没有办法获得每类的精度或召回率。

For example, If I have y_true and y_pred from each batch, is there a functional way to get precision or recall per class if I have more than 2 classes.例如,如果我有每个批次的 y_true 和 y_pred,如果我有 2 个以上的类,是否有一种功能性方法来获得每个类的精度或召回率。

Here's a solution that is working for me for a problem with n=6 classes.这是一个对我有用的解决方案,用于解决 n=6 类的问题。 If you have many more classes this solution is probably slow and you should use some sort of mapping instead of a loop.如果你有更多的类,这个解决方案可能很慢,你应该使用某种映射而不是循环。

Assume you have one hot encoded class labels in rows of tensor labels and logits (or posteriors) in tensor labels .假设你有张行一个热编码的等级标签, labels和张量logits(或后验) labels Then, if n is the number of classes, try this:然后,如果n是类的数量,试试这个:

y_true = tf.argmax(labels, 1)
y_pred = tf.argmax(logits, 1)

recall = [0] * n
update_op_rec = [[]] * n

for k in range(n):
    recall[k], update_op_rec[k] = tf.metrics.recall(
        labels=tf.equal(y_true, k),
        predictions=tf.equal(y_pred, k)
    )

Note that inside tf.metrics.recall , the variables labels and predictions are set to boolean vectors like in the 2 variable case, which allows the use of the function.请注意,在tf.metrics.recall ,变量labelspredictions被设置为布尔向量,就像在 2 变量情况下一样,这允许使用该函数。

2 facts: 2个事实:

  1. As stated in other answers, Tensorflow built-in metricsprecision and recall don't support multi-class (the doc says will be cast to bool )正如其他答案中所述,Tensorflow 内置指标精度召回率不支持多类(文档说will be cast to bool

  2. There are ways of getting one-versus-all scores by using precision_at_k by specifying the class_id , or by simply casting your labels and predictions to tf.bool in the right way.有通过使用获得一个抗所有得分方式precision_at_k通过指定class_id ,或者通过简单的铸造你的labels ,并predictionstf.bool以正确的方式。

Because this is unsatisfying and incomplete, I wrote tf_metrics , a simple package for multi-class metrics that you can find on github .因为这令人不满意tf_metrics完整,所以我编写了tf_metrics ,这是一个用于多类度量的简单包,您可以在github 上找到。 It supports multiple averaging methods like scikit-learn .它支持多种平均方法,如scikit-learn

Example示例

import tensorflow as tf
import tf_metrics

y_true = [0, 1, 0, 0, 0, 2, 3, 0, 0, 1]
y_pred = [0, 1, 0, 0, 1, 2, 0, 3, 3, 1]
pos_indices = [1]        # Metrics for class 1 -- or
pos_indices = [1, 2, 3]  # Average metrics, 0 is the 'negative' class
num_classes = 4
average = 'micro'

# Tuple of (value, update_op)
precision = tf_metrics.precision(
    y_true, y_pred, num_classes, pos_indices, average=average)
recall = tf_metrics.recall(
    y_true, y_pred, num_classes, pos_indices, average=average)
f2 = tf_metrics.fbeta(
    y_true, y_pred, num_classes, pos_indices, average=average, beta=2)
f1 = tf_metrics.f1(
    y_true, y_pred, num_classes, pos_indices, average=average)

I believe you cannot do multiclass precision, recall, f1 with the tf.metrics.precision/recall functions.我相信你不能用tf.metrics.precision/recall函数做多类精度、召回、f1。 You can use sklearn like this for a 3 class scenario:您可以像这样将 sklearn 用于 3 类场景:

from sklearn.metrics import precision_recall_fscore_support as score

prediction = [1,2,3,2] 
y_original = [1,2,3,3]

precision, recall, f1, _ = score(y_original, prediction)

print('precision: {}'.format(precision))
print('recall: {}'.format(recall))
print('fscore: {}'.format(f1))

This will print an array of precision, recall values but format it as you like.这将打印一个精度数组,召回值,但可以根据需要对其进行格式化。

I have been puzzled by this problem for quite a long time.我被这个问题困扰了很长时间。 I know this problem can be solved by sklearn, but I really want to solve this by Tensorflow's API.我知道这个问题可以通过 sklearn 来解决,但我真的很想通过 Tensorflow 的 API 来解决这个问题。 And by reading its code, I finally figure out how this API works.通过阅读它的代码,我终于弄清楚了这个 API 是如何工作的。

tf.metrics.precision_at_k(labels, predictions, k, class_id)
  • Firstly, let's assume this is a 4 classes problem.首先,让我们假设这是一个4 类问题。
  • Secondly, we have two samples which their labels are 3 and 1 and their predictions are [0.5,0.3,0.1,0.1], [0.5,0.3,0.1,0.1] .According to our predictions, we can get the result that the two samples has been predicted as 1,1 .其次,我们有两个样本,它们的标签是 3 和 1它们的预测是 [0.5,0.3,0.1,0.1], [0.5,0.3,0.1,0.1] 。根据我们的预测,我们可以得到两个结果样本已预测为1,1
  • Thirdly, if you want to get the precision of class 1 , use the formula TP/(TP+FP) , and we assume the result is 1/(1+1)=0.5 .第三,如果你想得到class 1的精度,使用公式TP/(TP+FP) ,我们假设结果是1/(1+1)=0.5 Because the two samples both have been predicted as 1 , but one of the them is actually 3 , so the TP is 1 , the FP is 1 , and the result is 0.5 .因为两个样本都被预测为1 ,但其中一个实际上是3 ,所以TP为1FP为1结果为0.5
  • Finally, let's use this API to verify our assumption.最后,让我们使用这个 API 来验证我们的假设。

     import tensorflow as tf labels = tf.constant([[2],[0]],tf.int64) predictions = tf.constant([[0.5,0.3,0.1,0.1],[0.5,0.3,0.1,0.1]]) metric = tf.metrics.precision_at_k(labels, predictions, 1, class_id=0) sess = tf.Session() sess.run(tf.local_variables_initializer()) precision, update = sess.run(metric) print(precision) # 0.5

NOTICE通知

  • k isn't the number of classes. k不是类的数量。 It represents the number of what we want to sort, which means the last dimension of predictions must match the value of k.它表示我们要排序的数量,这意味着预测的最后一个维度必须与 k 的值匹配。

  • class_id represents the Class for which we want binary metrics. class_id表示我们想要二进制度量的类。

  • If k=1, means that we won't sort the predictions, because what we want to do is actually a binary classificaion, but referring to different classes.如果k=1,意味着我们不会对预测进行排序,因为我们想要做的实际上是一个二元分类,而是指不同的类。 So if we sort the predictions, the class_id will be confused and the result will be wrong.所以如果我们对预测进行排序, class_id 就会混淆,结果就会出错。

  • And one more important thing is that if we want to get the right result, the input of label should minus 1 because the class_id actually represents the index of the label , and the subscript of label starts with 0 .还有更重要的一点是,如果我们想要得到正确的结果, label的输入应该是负1,因为class_id实际上代表的是label的索引,而label的下标是从0开始的

There is a way to do this in TensorFlow.在 TensorFlow 中有一种方法可以做到这一点。

tf.metrics.precision_at_k(labels, predictions, k, class_id)

set k = 1 and set corresponding class_id.设置 k = 1 并设置相应的 class_id。 For example class_id=0 to calculate the precision of first class.例如 class_id=0 计算第一类的精度。

I believe TF does not provide such functionality yet.我相信 TF 还没有提供这样的功能。 As per the docs (https://www.tensorflow.org/api_docs/python/tf/metrics/precision ), it says both the labels and predictions will be cast to bool, and so it relates only to binary classification.根据文档(https://www.tensorflow.org/api_docs/python/tf/metrics/precision ),它说标签和预测都将转换为 bool,因此它仅与二进制分类有关。 Perhaps it's possible to one-hot encode the examples and it would work?也许可以对示例进行单热编码并且它会起作用? But not sure about this.但不确定这一点。

Here's a complete example from predicting in Tensorflow to reporting via scikit-learn:这是从 Tensorflow 中的预测到通过 scikit-learn 报告的完整示例:

import tensorflow as tf
from sklearn.metrics import classification_report

# given trained model `model` and test vector `X_test` gives `y_test`
# where `y_test` and `y_predicted` are integers, who labels are indexed in 
# `labels`
y_predicted = tf.argmax(model.predict(X_test), axis=1)

# Confusion matrix
cf = tf.math.confusion_matrix(y_test, y_predicted)
plt.matshow(cf, cmap='magma')
plt.colorbar()
plt.xticks(np.arange(len(labels)), labels=labels, rotation=90)
plt.yticks(np.arange(len(labels)), labels=labels)
plt.clim(0, None)

# Report
print(classification_report(y_test, y_predicted, target_names=labels))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 获得一个分类报告,说明使用10倍交叉验证的多项式朴素贝叶斯的类精确度和召回率 - Get a classification report stating the class wise precision and recall for multinomial Naive Bayes using 10 fold cross validation Tensorflow Precision,Recall,F1-多标签分类 - Tensorflow Precision, Recall, F1 - multi label classification Class 在 tensorflow 上的 keras 中的明智分类 - Class wise classification in keras on tensorflow Tensorflow多类分类损失 - Tensorflow multi class classification loss 通过 Class 计算精度和召回率 - Calculating Precision and Recall by Class Keras-精度和召回率大于1(多种分类) - Keras - Precision and Recall is greater than 1 (Multi classification) 有没有办法在自定义 Tensorflow 模型中找到每个类的平均精度和召回率? - Is there a way to find the average precision and recall of each class in the custom Tensorflow model? python - 为不同的多类分类器绘制精度召回曲线 - python - Plot Precision Recall Curve for different multi-class classifiers 如何获得多类分类问题中每个类的精度分数? - How to get the precision score of every class in a Multi class Classification Problem? 如何从 Tensorflow 二值图像分类中获得召回率和精度 - How to get Recall and Precision from Tensorflow binary image classification
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM