TensorFlow Object Detection API - 如何在 COCO 数据集上进行训练并获得与报告的相同的 mAP？

Question

I'm trying to reproduce the officially reported mAP of EfficientDet D3 in the Object Detection API by training on COCO using a pretrained EfficientNet backbone.我试图通过使用预训练的 EfficientNet 主干对 COCO 进行训练，在对象检测 API 中重现官方报告的 EfficientDet D3 的 mAP。 The official COCO mAP is 45.4% and yet all I can manage to achieve is around 14%.官方的 COCO mAP 是45.4% ，但我所能达到的只有 14% 左右。 I don't need to reach the same value, but I wish to at least come close to it.我不需要达到相同的值，但我希望至少接近它。

I am loading the EfficientNet B3 checkpoint pretrained on ImageNet found here , and using the config file found here .我正在加载在此处找到的 ImageNet 上预训练的 EfficientNet B3 检查点，并使用此处找到的配置文件。 The only parameters I changed are batch size (to fit into an RTX 3090), learning rate (0.08 was yielding loss=NaN so I reduced it to 0.01), and steps, which I increased to 600k.我更改的唯一参数是批量大小（以适应 RTX 3090）、学习率（0.08 产生损失 = NaN，因此我将其减少到 0.01）和步数，我将其增加到 600k。 This is my pipeline.config file:这是我的 pipeline.config 文件：

  model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 90
    add_background_class: false
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: [1.0, 2.0, 0.5]
        scales_per_octave: 3
      }
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 896
        max_dimension: 896
        pad_to_max_dimension: true
        }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        depth: 160
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          force_use_bias: true
          activation: SWISH
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            scale: true
            decay: 0.99
            epsilon: 0.001
          }
        }
        num_layers_before_predictor: 4
        kernel_size: 3
        use_depthwise: true
      }
    }
    feature_extractor {
      type: 'ssd_efficientnet-b3_bifpn_keras'
      bifpn {
        min_level: 3
        max_level: 7
        num_iterations: 6
        num_filters: 160
      }
      conv_hyperparams {
        force_use_bias: true
        activation: SWISH
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          scale: true,
          decay: 0.99,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.25
          gamma: 1.5
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.5
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint: "/API/Tensorflow/models/research/object_detection/test_data/efficientnet_b3/efficientnet_b3/ckpt-0"
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint_type: "classification"
  batch_size: 2
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  use_bfloat16: false
  num_steps: 600000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_scale_crop_and_pad_to_square {
      output_size: 896
      scale_min: 0.1
      scale_max: 2.0
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: 1e-2
          total_steps: 600000
          warmup_learning_rate: .001
          warmup_steps: 2500
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  label_map_path: "/DATASETS/COCO/classes.pbtxt"
  tf_record_input_reader {
    input_path: "/DATASETS/COCO/coco_train.record-00000-of-00100"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1;
}

eval_input_reader: {
  label_map_path: "/DATASETS/COCO/classes.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/DATASETS/COCO/coco_val.record-00000-of-00050"
  }
}

These are the results:这些是结果：

Answer 1

Your loss is too high.你的损失太大了。 A loss around 1 indicates that your model is not being trained.大约 1 的损失表明您的模型没有被训练。 It doesn't learn the weights.它不学习权重。 There are a couple of things you can check:您可以检查几件事：

The dataset.数据集。 Are all images used during training?训练期间是否使用了所有图像？ Also have a look at the annotations.也看看注释。 Are classes and bounding boxes correct?类和边界框是否正确？ Or is there anything weird?或者有什么奇怪的地方？ For example, COCO's bounding boxes should be given as absolute value.例如，COCO 的边界框应该以绝对值的形式给出。 If there are given as relative value, this might indicate you need to rescale them.如果有作为相对值给出，这可能表明您需要重新调整它们。
Is the image resized?图像是否调整大小？ If so, its bounding box also needs to be resized.如果是这样，它的边界框也需要调整大小。
Check the bounding boxes.检查边界框。 Maybe plot a few images with their bounding boxes.也许用它们的边界框绘制一些图像。 If the bounding boxes are not in the correct format or its values are incorrectly scaled, you'll see it.如果边界框的格式不正确或其值缩放不正确，您会看到它。
To narrow down the source of this bug, try to load weights for EfficientNet that have been trained on COCO and see what happens if you try to finetune them further with a very low lr.要缩小此错误的来源，请尝试为已在 COCO 上训练的 EfficientNet 加载权重，看看如果您尝试使用非常低的 lr 进一步微调它们会发生什么。 If that doesn't work, then that is a very strong indication that there are problems with the annotations.如果这不起作用，那么这是一个非常强烈的迹象，表明注释存在问题。

Answer 2

Two suggestions: Batch-size is an essential hyper-parameter in deep learning.两个建议： Batch-size是深度学习中必不可少的超参数。 Different batch sizes may lead to various testing and training accuracies.不同的批量大小可能会导致不同的测试和训练精度。 Choosing an optimal batch size is crucial when training a neural network.在训练神经网络时，选择最佳批量大小至关重要。 [Source] [资源]

Using a batch-size of 1 (or 2) for a model with so many parameters may be the reason for lower accuracy.对具有如此多参数的模型使用 1（或 2）的batch-size可能是准确性较低的原因。

A higher number of epochs does not compensate for lower batch-size .更高数量的epochs并不能补偿更低batch-size 。

Another point which I noticed is that the paper makes use of jitter for augmentation .我注意到的另一点是该论文利用抖动进行augmentation 。

TensorFlow Object Detection API - 如何在 COCO 数据集上进行训练并获得与报告的相同的 mAP？

问题描述

2 个解决方案

解决方案1
1 2022-05-07 08:06:21

解决方案2
1 2022-05-10 09:35:14

TensorFlow Object Detection API - 如何在 COCO 数据集上进行训练并获得与报告的相同的 mAP？

问题描述

2 个解决方案

解决方案1 1 2022-05-07 08:06:21

解决方案2 1 2022-05-10 09:35:14

解决方案1
1 2022-05-07 08:06:21

解决方案2
1 2022-05-10 09:35:14