简体   繁体   中英

How to reduce the number of training steps in Tensorflow's Object Detection API?

I am following Dat Trans example to train my own Object Detector with TensorFlow's Object Detector API.

I successfully started to train the custom objects. I am using CPU to train the model but it takes around 3 hour to complete 100 training steps. I suppose i have to change some parameter in .config .

I tried to convert .ckpt to .pb , I referred this post, but i was still not able to convert

1) How to reduce the number of training steps?
2) Is there a way to convert .ckpt to .pb .

I don't think you can reduce the number of training step, but you can stop at any checkpoint( ckpt ) and then convert it to .pb file
From TensorFlow Model git repository you can use , export_inference_graph.py
and following code

python tensorflow_models/object_detection/export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path architecture_used_while_training.config \
--trained path_to_saved_ckpt/model.ckpt-NUMBER \
--output_directory model/

where NUMBER refers to your latest saved checkpoint file number, however you can use older checkpoint file if you find it better in tensorboard

1) I'm afraid there is no effective way to just "reduce" training steps. Using bigger batch sizes may lead to "faster" training (as in, reaching high accuracy in a lower number of steps ), but each step will take longer to compute, since you're running on your CPU. Playing around with input image resolution might give you a speedup, to the price of lower accuracy. You should really consider moving to a machine with a GPU.

2) .pb files (and their corresponding text version .pbtxt ) by default contain only the definition of your graph. If you freeze your graph, you take a checkpoint, get all the variables defined in the graph, convert them to constants and assign them the values stored in the checkpoint. You typically do this to ship your trained model to whoever will use it, but this is useless in the training stage.

I would highly recommend finding a way to speed up your per-training-step running time rather than reducing the number of training steps. The best way is to get your hands on a GPU. If you can't do this, you can look into reducing image resolution or using a lighter network.

For converting to a frozen inference graph (the .pb file), please see the documentation here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md

Ya there is one parameter in the .config file where you can reduce the number of step as much you want. num_steps: is in .config file which is actually number of epochs in training.

But please keep in mind that it is not recommended to reduce it much.Because if you reduce it much your loss function will not be reduce much which will give you bad output.

So keep seeing loss function, once it come under 1 , then you can start testing your model seprately and your training will be happening.

1. Yup there is a way to change the number of training steps:

try this,

python model_main_tf2.py --pipeline_config_path="config_path_here" --num_train_steps=5000 --model_dir="model_dir_here" --alsologtostderr

here I set the number of training steps to 5000

2. Yup there is a way to convert checkpoints into .pb:

try this,

python exporter_main_v2.py --trained_checkpoint_dir="checkpoint_dir_here" --pipeline_config_path="config_path_here" --output_directory "output_dir_here"

this will create a directory where the checkpoints and .pb file will be saved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM