I am fine-tuning SSD-MobileNetV3 Large and SSD-MobileDet-CPU on the COCO 2017 dataset but with only book class. I have created a new dataset for this and inspected the dataset and it is good. I have also modified the config file to my needs. When I start the training, it just ignores the 'fine_tune_checkpoint' provided in the config file and starts from scratch. However, if I do the same process but with the checkpoint in the 'model_dir' directory instead, it tries to restore it but since I have different number of classes, it gives an error. How can I make the training process restore the checkpoint properly? I also tried with normal COCO dataset with all 90 classes, and when I start the training, 'fine_tune_checkpoint' is ignored, but if I put the checkpoint in the 'model_dir', it is restored properly.
My config file is as below.
# SSDLite with MobileDet-CPU feature extractor.
# Reference: Xiong & Liu et al., https://arxiv.org/abs/2004.14525
# Trained on COCO, initialized from scratch.
#
# 0.45B MulAdds, 4.21M Parameters. Latency is 113ms on Pixel 1 CPU.
# Achieves 24.0 mAP on COCO14 minival dataset.
# Achieves 23.5 mAP on COCO17 val dataset.
#
# This config is TPU compatible.
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 1
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
}
}
similarity_calculator {
iou_similarity {
}
}
encode_background_as_zeros: true
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
}
}
image_resizer {
fixed_shape_resizer {
height: 320
width: 320
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 3
use_depthwise: true
box_code_size: 4
apply_sigmoid_to_scores: false
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
random_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
}
}
feature_extractor {
type: 'ssd_mobiledet_cpu'
min_depth: 16
depth_multiplier: 1.0
use_depthwise: true
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
}
}
override_base_feature_extractor_hyperparams: false
}
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.75,
gamma: 2.0
}
}
localization_loss {
weighted_smooth_l1 {
delta: 1.0
}
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
use_static_shapes: true
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 64
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 1
num_steps: 800000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: 0.8
total_steps: 800000
warmup_learning_rate: 0.13333
warmup_steps: 100
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
fine_tune_checkpoint: "./checkpoints/model.ckpt-400000"
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
fine_tune_checkpoint_type: "detection"
fine_tune_checkpoint_version: V1
load_all_detection_checkpoint_vars: true
}
train_input_reader: {
tf_record_input_reader {
input_path: "./tf_record_coco_books/coco_train.record"
}
label_map_path: "./tf_record_coco_books/label_map.pbtxt"
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
}
eval_input_reader: {
tf_record_input_reader {
input_path: "./tf_record_coco_books/coco_val.record"
}
label_map_path: "./tf_record_coco_books/label_map.pbtxt"
shuffle: false
num_readers: 1
}
You have to specify a model_dir that is different from the directory where your are loading the previously trained checkpoint.
At the very beginning of the training, the Tensorflow Object Detection API training script (either the current model_main or the legacy/train ) will create a new checkpoint corresponding to your new config in your model_dir and then train over this checkpoint. If your directory already contains the pre-trained checkpoints, it will indeed raise an issue corresponding to the number of classe.
If that doesn't work your could also change in your config file the field :
fine_tune_checkpoint_type = "detection"
to :
fine_tune_checkpoint_type = "fine_tune"
regarding that is a current issue on the Object Detection API : https://github.com/tensorflow/models/issues/8892#issuecomment-680207038
The issue rises from line 446 in model_lib.py
load_pretrained = hparams.load_pretrained if hparams else False;
because one of the previous commits changed hparams to None, so load_pretrained is always False. Setting it to True, and reinstalling the object_detection library fixes the problem.
I've mentioned this in related github issue: https://github.com/tensorflow/models/issues/9284
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.