简体   繁体   English

如何从 faster-RCNN 计算 F1 分数和其他分类指标? (PyTorch 中的对象检测)

[英]How can I calculate the F1-score and other classification metrics from a faster-RCNN? (object detection in PyTorch)

I'm trying to wrap my head around this but struggling to understand how I can compute the f1-score in an object detection task.我正在努力解决这个问题,但很难理解如何在 object 检测任务中计算 f1 分数。

Ideally, I would like to know false positives, true positives, false negatives and true negatives for every target in the image (it's a binary problem with an object in the image as one class and the background as the other class).理想情况下,我想知道图像中每个目标的假阳性、真阳性、假阴性和真阴性(这是一个二进制问题,图像中的 object 作为一个 class,背景作为另一类)。

Eventually I would also like to extract the false positive bounding boxes from the image.最后我还想从图像中提取误报边界框。 I'm not sure if this is efficient but I'd save the image names and bbox predictions and whether they are false positives etc. into a numpy file.我不确定这是否有效,但我会将图像名称和 bbox 预测以及它们是否是误报等保存到 numpy 文件中。

I currently have this set up with a batch size of 1 so I can apply a non-maximum suppression algorithm per image:我目前将此设置为批量大小为 1,因此我可以对每个图像应用非最大抑制算法:

def apply_nms(orig_prediction, iou_thresh=0.3):
    
    # torchvision returns the indices of the bboxes to keep
    keep = torchvision.ops.nms(orig_prediction['boxes'], orig_prediction['scores'], iou_thresh)
    
    final_prediction = orig_prediction
    final_prediction['boxes'] = final_prediction['boxes'][keep]
    final_prediction['scores'] = final_prediction['scores'][keep]
    final_prediction['labels'] = final_prediction['labels'][keep]
    
    return final_prediction


cpu_device = torch.device("cpu")
model.eval()
with torch.no_grad():
  for images, targets in valid_data_loader:
    images = list(img.to(device) for img in images)
    outputs = model(images)
    outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
    predictions = apply_nms(outputs[0], iou_thresh=0.3)

Any idea on how I can determine the aforementioned classification metrics and f1-score?关于如何确定上述分类指标和 f1 分数的任何想法?

I've come across this line in an evaluation code provided by torchvision and wondering whether it would help me going forward:我在 torchvision 提供的评估代码中遇到过这一行,想知道它是否会帮助我前进:

res = {target["image_id"].item(): output for target, output in zip(targets, outputs)}

The use of the terms precision, recall, and F1 score in object detection are slightly confusing because these metrics were originally used for binary evaluation tasks (eg classifiation). object 检测中术语精度、召回率和 F1 分数的使用有点令人困惑,因为这些指标最初用于二元评估任务(例如分类)。 In any case, in object detection they have slightly different meanings:无论如何,在 object 检测中它们的含义略有不同:

let: TP - set of predicted objects that are successfully matched to a ground truth object (above IOU threshold for whatever dataset you're using, generally 0.5 or 0.7) FP - set of predicted objects that were not successfully matched to a ground truth object FN - set of ground truth objects that were not successfully matched to a predicted object let: TP - 一组成功匹配到 ground truth 的预测对象 object(高于你使用的任何数据集的 IOU 阈值,通常为 0.5 或 0.7) FP - 一组未成功匹配到 ground truth 的预测对象 object FN - 未成功匹配预测的地面实况对象集 object

Precision: TP / (TP + FP)
Recall:    TP / (TP + FN)
F1:        2*Precision*Recall /(Precision + Recall)

You can find many an implementation of the matching step (matching ground truth and predicted objects) generally provided with an dataset for evaluation, or you can implement it yourself.您可以找到许多匹配步骤(匹配地面实况和预测对象)的实现,通常提供用于评估的数据集,或者您可以自己实现。 I'll suggest the py-motmetrics repository .我会建议py-motmetrics 存储库

A simple implementation of the IOU calculation might look like: IOU 计算的简单实现可能如下所示:

def iou(self,a,b):
    """
    Description
    -----------
    Calculates intersection over union for all sets of boxes in a and b

    Parameters
    ----------
    a : tensor of size [batch_size,4] 
        bounding boxes
    b : tensor of size [batch_size,4]
        bounding boxes.

    Returns
    -------
    iou - float between [0,1]
        average iou for a and b
    """
    
    area_a = (a[2]-a[0]) * (a[3]-a[1])
    area_b = (b[2]-b[0]) * (b[3]-b[1])
    
    minx = max(a[0], b[0])
    maxx = min(a[2], b[2])
    miny = max(a[1], b[1])
    maxy = min(a[3], b[3])
    
    intersection = max(0, maxx-minx) * max(0,maxy-miny)
    union = area_a + area_b - intersection
    iou = intersection/union
    
    return iou

So I've implemented the f1 score to be calculated globally- that is for the entire dataset.所以我已经实现了要在全球范围内计算的 f1 分数——这是针对整个数据集的。

The implementation below gives an example of determining the f1-score for a validation set.下面的实现给出了确定验证集的 f1 分数的示例。

The outputs of the model are in a dictionary format, and so we need to place them into tensors like this: model 的输出是字典格式,因此我们需要将它们放入张量中,如下所示:

predicted_boxes (list): [[train_index, class_prediction, prob_score, x1, y1, x2, y2],[],...[]]

train_index: index of image that the specific bbox comes from class_prediction: integer value representing class prediction prob_score: outputed objectiveness score for a bbox x1,y1,x2,y2: (x1, y1) and (x2,y2) bbox coordinates train_index:特定bbox来自的图像的索引class_prediction:integer代表class预测的值prob_score:bbox x1,y1,x2,y2的输出客观性得分:(x1,y1)和(x2,y2)bbox坐标

gt_boxes (list): [[train_index, class_prediction, prob_score, x1, y1, x2, y2],[],...[]]

Where prob_score is just 1 for the ground truth inputs (it could be anything really as long as that dimension is specified and filled in).其中prob_score对于地面实况输入仅为1 (只要指定并填写该维度,它实际上可以是任何东西)。

IoU is also implemented in torchvision which makes everything a lot easier. IoU 也在 torchvision 中实现,这使一切变得容易得多。

I hope this helps others as I couldn't find another implementation of f1 score in object detection anywhere else.我希望这对其他人有帮助,因为我在其他任何地方都找不到 object 检测中 f1 分数的另一个实现。

model_test.eval()
with torch.no_grad():
  global_tp = []
  global_fp = []
  global_gt = []


  valid_df_unique = get_unique(valid_df['image_id'])

  for images, targets in valid_data_loader:
    images = list(img.to(device) for img in images)
    outputs = model_test(images)
    outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
    predictions = apply_nms(outputs[0], iou_thresh=0.1)


    # looping through each class
    for c in range(num_classes):
      # detections (list): predicted_boxes that are class c
      detections = []
      # ground_truths (list): gt_boxes that are class c
      ground_truths = []

    

      for b,la,s in zip(predictions['boxes'], predictions['labels'],predictions['scores']): 
        updated_detection_array = [targets[0]['image_id'].item(), la.item(), s.item(), b[0].item(),b[1].item(),b[2].item(),b[3].item()]
        if la.item() == c:
          detections.append(updated_detection_array)

      for b,la in zip(targets[0]['boxes'], targets[0]['labels']): 
        updated_gt_array = [targets[0]['image_id'].item(), la.item(), 1, b[0].item(),b[1].item(),b[2].item(),b[3].item()]
        if la.item() == c:
          ground_truths.append(updated_gt_array)
          global_gt.append(updated_gt_array)


    
      # use Counter to create a dictionary where key is image # and value
      # is the # of bboxes in the given image
      amount_bboxes = Counter([gt[0] for gt in ground_truths])

      # goal: keep track of the gt bboxes we have already "detected" with prior predicted bboxes
      # key: image #
      # value: tensor of 0's (size is equal to # of bboxes in the given image)
      for key, value in amount_bboxes.items():
        amount_bboxes[key] = torch.zeros(value)

      # sort over the probabiliity scores of the detections
      detections.sort(key = lambda x: x[2], reverse = True)
      
      true_Positives = torch.zeros(len(detections))
      false_Positives = torch.zeros(len(detections))
      total_gt_bboxes = len(ground_truths)

      false_positives_frame = []
      true_positives_frame = []


      # iterate through all detections in given class c
      for detection_index, detection in enumerate(detections):
        # detection[0] indicates image #
        # ground_truth_image: the gt bbox's that are in same image as detection
        ground_truth_image = [bbox for bbox in ground_truths if bbox[0] == detection[0]]

        # num_gt_boxes: number of ground truth boxes in given image
        num_gt_boxes = len(ground_truth_image)
        best_iou = 0
        best_gt_index = 0


        for index, gt in enumerate(ground_truth_image):
          
          iou = torchvision.ops.box_iou(torch.tensor(detection[3:]).unsqueeze(0), 
                                        torch.tensor(gt[3:]).unsqueeze(0))
          
          if iou > best_iou:
            best_iou = iou
            best_gt_index = index

        if best_iou > iou_threshold:
          # check if gt_bbox with best_iou was already covered by previous detection with higher confidence score
          # amount_bboxes[detection[0]][best_gt_index] == 0 if not discovered yet, 1 otherwise
          if amount_bboxes[detection[0]][best_gt_index] == 0:
            true_Positives[detection_index] = 1
            amount_bboxes[detection[0]][best_gt_index] == 1
            true_positives_frame.append(detection)
            global_tp.append(detection)

          else:
            false_Positives[detection_index] = 1
            false_positives_frame.append(detection)
            global_fp.append(detection)
        else:
          false_Positives[detection_index] = 1
          false_positives_frame.append(detection)
          global_fp.append(detection)


# remove nan values from ground truth list as list contains every mitosis image row entry (including images with no targets)
global_gt_updated = []
for gt in global_gt:
  if math.isnan(gt[3]) == False:
    global_gt_updated.append(gt)


global_fn = len(global_gt_updated) - len(global_tp)

precision = len(global_tp)/ (len(global_tp)+ len(global_fp))
recall = len(global_tp)/ (len(global_tp) + global_fn)

f1_score =  2* (precision * recall)/ (precision + recall)

print(len(global_tp))
print(recall)
print(precision)
print(f1_score)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何提高 CNN 分类中的 F1-score - How to improve the F1-score in CNN classification Tensorflow对象检测API未经训练的Faster-RCNN模型 - Tensorflow Object Detection API Untrained Faster-RCNN Model 如何计算神经网络模型中的准确率、召回率和 F1 分数? - How can I calculate precision, recall and F1-score in Neural Network models? pytorch Faster-RCNN 的验证损失 - Validation loss for pytorch Faster-RCNN 如何从Sklearn分类报告中返回精确度,召回率和F1分数的平均分数? - How to return average score for precision, recall and F1-score from Sklearn Classification report? 带有二进制混淆矩阵的sklearns.metrics.classification_report和sklearns.metrics.f1_score中每个F1得分值之间的差异 - Differences between each F1-score values in sklearns.metrics.classification_report and sklearns.metrics.f1_score with a binary confusion matrix Pytorch中Faster-RCNN模型的输入图像大小 - Input image size of Faster-RCNN model in Pytorch 无法关闭 faster-rcnn (PyTorch) 的批量规范层 - Not able to switch off batch norm layers for faster-rcnn (PyTorch) 如何确定更快的 RCNN (PyTorch) 的验证损失? - How can I determine validation loss for faster RCNN (PyTorch)? 图像尺寸预测时的快速RCNN Pytorch问题 - Faster-RCNN Pytorch problem at prediction time with image dimensions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM