[英]TensorFlow Object Detection API - How to train on COCO dataset and achieve same mAP as the reported one?
I'm trying to reproduce the officially reported mAP of EfficientDet D3 in the Object Detection API by training on COCO using a pretrained EfficientNet backbone.我试图通过使用预训练的 EfficientNet 主干对 COCO 进行训练,在对象检测 API 中重现官方报告的 EfficientDet D3 的 mAP。 The official COCO mAP is 45.4% and yet all I can manage to achieve is around 14%.官方的 COCO mAP 是45.4% ,但我所能达到的只有 14% 左右。 I don't need to reach the same value, but I wish to at least come close to it.我不需要达到相同的值,但我希望至少接近它。
I am loading the EfficientNet B3 checkpoint pretrained on ImageNet found here , and using the config file found here .我正在加载在此处找到的 ImageNet 上预训练的 EfficientNet B3 检查点,并使用此处找到的配置文件。 The only parameters I changed are batch size (to fit into an RTX 3090), learning rate (0.08 was yielding loss=NaN so I reduced it to 0.01), and steps, which I increased to 600k.我更改的唯一参数是批量大小(以适应 RTX 3090)、学习率(0.08 产生损失 = NaN,因此我将其减少到 0.01)和步数,我将其增加到 600k。 This is my pipeline.config file:这是我的 pipeline.config 文件:
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 90
add_background_class: false
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
multiscale_anchor_generator {
min_level: 3
max_level: 7
anchor_scale: 4.0
aspect_ratios: [1.0, 2.0, 0.5]
scales_per_octave: 3
}
}
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 896
max_dimension: 896
pad_to_max_dimension: true
}
}
box_predictor {
weight_shared_convolutional_box_predictor {
depth: 160
class_prediction_bias_init: -4.6
conv_hyperparams {
force_use_bias: true
activation: SWISH
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
}
}
batch_norm {
scale: true
decay: 0.99
epsilon: 0.001
}
}
num_layers_before_predictor: 4
kernel_size: 3
use_depthwise: true
}
}
feature_extractor {
type: 'ssd_efficientnet-b3_bifpn_keras'
bifpn {
min_level: 3
max_level: 7
num_iterations: 6
num_filters: 160
}
conv_hyperparams {
force_use_bias: true
activation: SWISH
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
scale: true,
decay: 0.99,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.25
gamma: 1.5
}
}
localization_loss {
weighted_smooth_l1 {
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.5
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
fine_tune_checkpoint: "/API/Tensorflow/models/research/object_detection/test_data/efficientnet_b3/efficientnet_b3/ckpt-0"
fine_tune_checkpoint_version: V2
fine_tune_checkpoint_type: "classification"
batch_size: 2
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
use_bfloat16: false
num_steps: 600000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_scale_crop_and_pad_to_square {
output_size: 896
scale_min: 0.1
scale_max: 2.0
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 1e-2
total_steps: 600000
warmup_learning_rate: .001
warmup_steps: 2500
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
label_map_path: "/DATASETS/COCO/classes.pbtxt"
tf_record_input_reader {
input_path: "/DATASETS/COCO/coco_train.record-00000-of-00100"
}
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 1;
}
eval_input_reader: {
label_map_path: "/DATASETS/COCO/classes.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "/DATASETS/COCO/coco_val.record-00000-of-00050"
}
}
These are the results:这些是结果:
Your loss is too high.你的损失太大了。 A loss around 1 indicates that your model is not being trained.大约 1 的损失表明您的模型没有被训练。 It doesn't learn the weights.它不学习权重。 There are a couple of things you can check:您可以检查几件事:
Two suggestions: Batch-size
is an essential hyper-parameter in deep learning.两个建议: Batch-size
是深度学习中必不可少的超参数。 Different batch sizes may lead to various testing and training accuracies.不同的批量大小可能会导致不同的测试和训练精度。 Choosing an optimal batch size is crucial when training a neural network.在训练神经网络时,选择最佳批量大小至关重要。 [Source] [资源]
Using a batch-size
of 1 (or 2) for a model with so many parameters may be the reason for lower accuracy.对具有如此多参数的模型使用 1(或 2)的batch-size
可能是准确性较低的原因。
A higher number of epochs
does not compensate for lower batch-size
.更高数量的epochs
并不能补偿更低batch-size
。
Another point which I noticed is that the paper makes use of jitter for augmentation
.我注意到的另一点是该论文利用抖动进行augmentation
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.