如何减少 Tensorflow 的对象检测 API 中的训练步骤数？

Question

I am following Dat Trans example to train my own Object Detector with TensorFlow's Object Detector API.我正在按照Dat Trans示例使用 TensorFlow 的对象检测器 API 训练我自己的对象检测器。

I successfully started to train the custom objects.我成功地开始训练自定义对象。 I am using CPU to train the model but it takes around 3 hour to complete 100 training steps.我正在使用 CPU 来训练模型，但完成 100 个训练步骤大约需要 3 小时。 I suppose i have to change some parameter in .config .我想我必须更改.config一些参数。

I tried to convert .ckpt to .pb , I referred this post, but i was still not able to convert我试图将.ckpt转换为.pb ，我参考了这篇文章，但我仍然无法转换

1) How to reduce the number of training steps? 1）如何减少训练步数？
2) Is there a way to convert .ckpt to .pb . 2）有没有办法将.ckpt转换为.pb 。

Answer 1

I don't think you can reduce the number of training step, but you can stop at any checkpoint( ckpt ) and then convert it to .pb file我不认为你可以减少训练步骤的数量，但你可以在任何检查点（ ckpt ）停止，然后将其转换为.pb文件
From TensorFlow Model git repository you can use , export_inference_graph.py从 TensorFlow 模型 git 存储库中，您可以使用export_inference_graph.py
and following code和以下代码

python tensorflow_models/object_detection/export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path architecture_used_while_training.config \
--trained path_to_saved_ckpt/model.ckpt-NUMBER \
--output_directory model/

where NUMBER refers to your latest saved checkpoint file number, however you can use older checkpoint file if you find it better in tensorboard其中NUMBER是指您最近保存的检查点文件编号，但是如果您在 tensorboard 中发现它更好，您可以使用较旧的检查点文件

Answer 2

1) I'm afraid there is no effective way to just "reduce" training steps. 1）恐怕没有有效的方法来“减少”训练步骤。 Using bigger batch sizes may lead to "faster" training (as in, reaching high accuracy in a lower number of steps ), but each step will take longer to compute, since you're running on your CPU.使用更大的批量可能会导致“更快”的训练（例如，在较少的步骤中达到高精度），但每个步骤都需要更长的时间来计算，因为您在 CPU 上运行。 Playing around with input image resolution might give you a speedup, to the price of lower accuracy.使用输入图像分辨率可能会加快速度，但代价是精度较低。 You should really consider moving to a machine with a GPU.您真的应该考虑迁移到配备 GPU 的机器上。

2) .pb files (and their corresponding text version .pbtxt ) by default contain only the definition of your graph. 2) .pb文件（及其相应的文本版本.pbtxt ）默认仅包含图形的定义。 If you freeze your graph, you take a checkpoint, get all the variables defined in the graph, convert them to constants and assign them the values stored in the checkpoint.如果你冻结你的图表，你会采取一个检查点，获取图表中定义的所有变量，将它们转换为常量，并将存储在检查点中的值分配给它们。 You typically do this to ship your trained model to whoever will use it, but this is useless in the training stage.您通常这样做是为了将经过训练的模型发送给使用它的任何人，但这在训练阶段是无用的。

Answer 3

I would highly recommend finding a way to speed up your per-training-step running time rather than reducing the number of training steps.我强烈建议找到一种方法来加快每个训练步骤的运行时间，而不是减少训练步骤的数量。 The best way is to get your hands on a GPU.最好的方法是使用 GPU。 If you can't do this, you can look into reducing image resolution or using a lighter network.如果你不能这样做，你可以考虑降低图像分辨率或使用更轻的网络。

For converting to a frozen inference graph (the .pb file), please see the documentation here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md要转换为冻结推理图（.pb 文件），请参阅此处的文档： https : //github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md

Answer 4

Ya there is one parameter in the .config file where you can reduce the number of step as much you want.是的，.config 文件中有一个参数，您可以在其中尽可能减少步骤数。 num_steps: is in .config file which is actually number of epochs in training. num_steps：在 .config 文件中，它实际上是训练中的时期数。

But please keep in mind that it is not recommended to reduce it much.Because if you reduce it much your loss function will not be reduce much which will give you bad output.但请记住，不建议减少太多。因为如果减少太多，您的损失函数不会减少太多，这会给您带来糟糕的输出。

So keep seeing loss function, once it come under 1 , then you can start testing your model seprately and your training will be happening.所以继续观察损失函数，一旦它低于 1 ，那么你就可以开始单独测试你的模型，你的训练就会发生。

Answer 5

1. Yup there is a way to change the number of training steps: 1. 是的，有一种方法可以改变训练步骤的数量：

try this,试试这个，

python model_main_tf2.py --pipeline_config_path="config_path_here" --num_train_steps=5000 --model_dir="model_dir_here" --alsologtostderr

here I set the number of training steps to 5000这里我将训练步数设置为 5000

2. Yup there is a way to convert checkpoints into .pb: 2. 是的，有一种方法可以将检查点转换为 .pb：

try this,试试这个，

python exporter_main_v2.py --trained_checkpoint_dir="checkpoint_dir_here" --pipeline_config_path="config_path_here" --output_directory "output_dir_here"

this will create a directory where the checkpoints and .pb file will be saved.这将创建一个目录，用于保存检查点和 .pb 文件。

如何减少 Tensorflow 的对象检测 API 中的训练步骤数？

问题描述

5 个解决方案

解决方案1
3 已采纳 2017-11-08 10:30:44

解决方案2
2 2017-11-03 09:51:31

解决方案3
0 2017-11-05 20:45:07

解决方案4
0 2017-11-06 06:32:32

解决方案5
0 2020-11-15 11:04:10

如何减少 Tensorflow 的对象检测 API 中的训练步骤数？

问题描述

5 个解决方案

解决方案1 3 已采纳 2017-11-08 10:30:44

解决方案2 2 2017-11-03 09:51:31

解决方案3 0 2017-11-05 20:45:07

解决方案4 0 2017-11-06 06:32:32

解决方案5 0 2020-11-15 11:04:10

解决方案1
3 已采纳 2017-11-08 10:30:44

解决方案2
2 2017-11-03 09:51:31

解决方案3
0 2017-11-05 20:45:07

解决方案4
0 2017-11-06 06:32:32

解决方案5
0 2020-11-15 11:04:10