How to reduce the number of training steps in Tensorflow's Object Detection API?

Question

I am following Dat Trans example to train my own Object Detector with TensorFlow's Object Detector API.

I successfully started to train the custom objects. I am using CPU to train the model but it takes around 3 hour to complete 100 training steps. I suppose i have to change some parameter in .config .

I tried to convert .ckpt to .pb , I referred this post, but i was still not able to convert

1) How to reduce the number of training steps?
2) Is there a way to convert .ckpt to .pb .

Answer 1

I don't think you can reduce the number of training step, but you can stop at any checkpoint( ckpt ) and then convert it to .pb file
From TensorFlow Model git repository you can use , export_inference_graph.py
and following code

python tensorflow_models/object_detection/export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path architecture_used_while_training.config \
--trained path_to_saved_ckpt/model.ckpt-NUMBER \
--output_directory model/

where NUMBER refers to your latest saved checkpoint file number, however you can use older checkpoint file if you find it better in tensorboard

Answer 2

1) I'm afraid there is no effective way to just "reduce" training steps. Using bigger batch sizes may lead to "faster" training (as in, reaching high accuracy in a lower number of steps ), but each step will take longer to compute, since you're running on your CPU. Playing around with input image resolution might give you a speedup, to the price of lower accuracy. You should really consider moving to a machine with a GPU.

2) .pb files (and their corresponding text version .pbtxt ) by default contain only the definition of your graph. If you freeze your graph, you take a checkpoint, get all the variables defined in the graph, convert them to constants and assign them the values stored in the checkpoint. You typically do this to ship your trained model to whoever will use it, but this is useless in the training stage.

Answer 3

I would highly recommend finding a way to speed up your per-training-step running time rather than reducing the number of training steps. The best way is to get your hands on a GPU. If you can't do this, you can look into reducing image resolution or using a lighter network.

For converting to a frozen inference graph (the .pb file), please see the documentation here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md

Answer 4

Ya there is one parameter in the .config file where you can reduce the number of step as much you want. num_steps: is in .config file which is actually number of epochs in training.

But please keep in mind that it is not recommended to reduce it much.Because if you reduce it much your loss function will not be reduce much which will give you bad output.

So keep seeing loss function, once it come under 1 , then you can start testing your model seprately and your training will be happening.

Answer 5

1. Yup there is a way to change the number of training steps:

try this,

python model_main_tf2.py --pipeline_config_path="config_path_here" --num_train_steps=5000 --model_dir="model_dir_here" --alsologtostderr

here I set the number of training steps to 5000

2. Yup there is a way to convert checkpoints into .pb:

try this,

python exporter_main_v2.py --trained_checkpoint_dir="checkpoint_dir_here" --pipeline_config_path="config_path_here" --output_directory "output_dir_here"

this will create a directory where the checkpoints and .pb file will be saved.

How to reduce the number of training steps in Tensorflow's Object Detection API?

Question

5 answers

solution1
3 ACCPTED 2017-11-08 10:30:44

solution2
2 2017-11-03 09:51:31

solution3
0 2017-11-05 20:45:07

solution4
0 2017-11-06 06:32:32

solution5
0 2020-11-15 11:04:10

How to reduce the number of training steps in Tensorflow's Object Detection API?

Question

5 answers

solution1 3 ACCPTED 2017-11-08 10:30:44

solution2 2 2017-11-03 09:51:31

solution3 0 2017-11-05 20:45:07

solution4 0 2017-11-06 06:32:32

solution5 0 2020-11-15 11:04:10

solution1
3 ACCPTED 2017-11-08 10:30:44

solution2
2 2017-11-03 09:51:31

solution3
0 2017-11-05 20:45:07

solution4
0 2017-11-06 06:32:32

solution5
0 2020-11-15 11:04:10