简体   繁体   中英

How can I stop a training job in tensorflow?

I'm using this tutorial to train my own object detector ( https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html ), as far as I could see, it doesn't teach us how to stop or when to stop a traning job. Can you guys please help me with this? I'm trainig my model for almost 24h, my total loss is about 2.

Loss is a relative value, as it does not have a direct correlation like accuracy to how good the model does so a value of 2 does not provide much insight. You can see if the loss is decreasing if the loss is decreasing you can keep training the model for more number of steps.

If your question is how to set the number of epochs. Those configurations are to be done in the *.config file. You can edit the config file to change the values for batch size and number of steps.

Number of epochs trained = (Number of images in training set / batch size)*num_steps

*One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM