[英]How can I stop a training job in tensorflow?
I'm using this tutorial to train my own object detector ( https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html ), as far as I could see, it doesn't teach us how to stop or when to stop a traning job.我正在使用本教程来训练我自己的 object 检测器( https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.ZFC35FDC70D5FC69D269883A822 ) t 教我们如何停止或何时停止培训工作。 Can you guys please help me with this?你们能帮我解决这个问题吗? I'm trainig my model for almost 24h, my total loss is about 2.我正在训练我的 model 将近 24 小时,我的总损失约为 2。
Loss is a relative value, as it does not have a direct correlation like accuracy to how good the model does so a value of 2 does not provide much insight.损失是一个相对值,因为它与 model 的精度没有直接相关性,因此值为 2 并不能提供太多洞察力。 You can see if the loss is decreasing if the loss is decreasing you can keep training the model for more number of steps.如果损失减少,您可以查看损失是否减少,您可以继续训练 model 以获得更多步数。
If your question is how to set the number of epochs.如果您的问题是如何设置时期数。 Those configurations are to be done in the *.config file.这些配置将在 *.config 文件中完成。 You can edit the config file to change the values for batch size and number of steps.您可以编辑配置文件以更改批量大小和步骤数的值。
Number of epochs trained = (Number of images in training set / batch size)*num_steps训练的 epoch 数 =(训练集中的图像数 / 批量大小)*num_steps
*One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE *一个时期是指整个数据集仅通过神经网络向前和向后传递一次
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.