简体   繁体   English

Tensorflow Colab:运行时已断开连接到运行时已超时

[英]Tensorflow Colab: Runtime disconnected The connection to the runtime has timed out

How come after 2 hours of running a model, I get a popup window saying:为什么在运行 model 2 小时后,我得到一个弹出窗口 window 说:

    Runtime disconnected

    The connection to the runtime has timed out.


                     CLOSE             RECONNECT

I had restarted my runtime and thought I have 12 hours to train a model.我重新启动了运行时,并认为我有 12 个小时来训练 model。 Any Idea how to avoid this?任何想法如何避免这种情况? My other question: Is it possible to find out the time left for runtime to get disconnected using a TF or Python API?我的另一个问题:是否可以使用 TF 或 Python API 找出运行时断开连接的剩余时间?

Runtime gets disconnected when the notebook goes to "idle" mode for a time greater than 90 minutes.当笔记本进入“空闲”模式超过 90 分钟时,运行时会断开连接。 This is an unofficial number, as google colab has no official release about this.这是一个非官方的数字,因为 google colab 没有关于此的官方发布。 This is how google colab gets away with it by answering cheekily:这就是 google colab 通过厚颜无耻地回答而摆脱它的方式:

An extract from the Official Colab FAQ Colab 官方常见问题解答的摘录

Where is my code executed?我的代码在哪里执行? What happens to my execution state if I close the browser window?如果我关闭浏览器 window,我的执行 state 会发生什么?

Code is executed in a virtual machine dedicated to your account.代码在专用于您帐户的虚拟机中执行。 Virtual machines are recycled when idle for a while, and have a maximum lifetime enforced by the system.虚拟机在空闲一段时间后会被回收,并具有系统强制执行的最长生命周期。

So to avoid this, keep your browser open and don't let your system sleep for a time greater than 90 minutes.因此,为避免这种情况,请保持浏览器打开,并且不要让系统睡眠时间超过 90 分钟。

This also means if you happen to close your browser within the 90 minutes, then if you reopen the notebook within 90 minutes you will still have all your running processes and session variables intact!这也意味着如果您碰巧在 90 分钟内关闭了浏览器,那么如果您在 90 分钟内重新打开笔记本,您的所有正在运行的进程和 session 变量仍然完好无损!

Also, note that currently you can run a notebook for a maximum of 12 hours.另外,请注意,目前您最多可以运行笔记本 12 小时。 (in the "non-idle" state of course). (当然在“非空闲”state 中)。

To answer your second question, this "idle state" stuff is a colab thing.要回答您的第二个问题,这种“空闲状态”的东西是 colab 的东西。 So I don't think TF or Python will have anything to do with it.所以我不认为 TF 或 Python 与它有任何关系。

So it is good practise to have your models saved into a folder periodically.因此,最好定期将模型保存到文件夹中。 This way, in the unfortunate event of your runtime getting disconnected, your work will not be lost.这样,在您的运行时断开连接的不幸事件中,您的工作不会丢失。 And you can simply restart your training from the latest saved model!您可以简单地从最新保存的模型重新开始训练!

PS: I got the number 90 minutes from an experiment done by a fellow user PS:我从一位用户的实验中得到了 90 分钟的数字

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM