[英]How to detect issue occurs in watchdog daemon in Linux(Debian) before watchdog reboot the OS
I am working on an application project on Debian Linux which involves software watchdog to monitors other services by PID file created by services. 我正在Debian Linux上开发一个应用程序项目,该项目涉及软件监视程序,以通过服务创建的PID文件监视其他服务。
I am following the steps from http://linux.die.net/man/5/watchdog.conf and installed it by 我正在按照http://linux.die.net/man/5/watchdog.conf中的步骤进行操作,并通过
apt-get install watchdog apt-get install看门狗
The mechanism behind is that watchdog checks these PID files existence those are configured in /etc/watchdog,conf file. 背后的机制是看门狗检查这些PID文件是否存在于/ etc / watchdog,conf文件中。
I have tested it by stopping any service by service service-name stop 我已经通过按服务service-name stop停止任何服务来测试了它
Watchdog will detect that service is not in running state hence it reboot the system after some seconds equal to watchdog timeout period. 看门狗将检测到服务未处于运行状态,因此它将在等于看门狗超时时间的几秒钟后重新启动系统。
Consider we have a display less product then it would rebooting the system infinite time without any intimation to end user in case of a service's configuration files are corrupted etc. 考虑到我们的显示器产品较少,那么在服务的配置文件损坏等情况下,它将无限制地重启系统,而不会影响最终用户。
The practical expectation is that before taking action by watchdog for reboot/halt/soft-restart I am want to know the status of watchdog so that programmer can implement intimation logic for end user. 实际的期望是,在看门狗采取措施进行重启/停止/软重启之前,我想知道看门狗的状态,以便程序员可以为最终用户实现提示逻辑。
Otherwise can it possible to modify watchdog init script in /etc/init.d/ to call user program on stopping the software watchdog so that programmer will able to maintain a counter in non-volatile memory to avoid infinite time reboot. 否则,可以修改/etc/init.d/中的看门狗初始化脚本,以在停止软件看门狗时调用用户程序,以便程序员能够在非易失性存储器中维护一个计数器,以避免无限时间重启。
Except above I want more about this software watchdog or watchdog daemon to get status. 除上述之外,我还想了解有关此软件看门狗或看门狗守护程序的更多信息,以获取状态。 I have implemented it to monitor services, CPU overload, temperature etc but I am not getting any event before watchdog action hence I am not getting why the system restarting due to a service down, CPU overheat or CPU overload etc.
我已经实现了它来监视服务,CPU过载,温度等,但是在看门狗操作之前没有收到任何事件,因此我无法理解为什么由于服务中断,CPU过热或CPU过载等原因而导致系统重启。
A watchdog is designed as a last resort to rescue a system after it has failed beyond recovery. 看门狗被设计为在无法恢复的故障后拯救系统的最后手段。 A hardware watchdog will physically reset the CPU, and is used to make sure that a system doesn't hang for long periods.
硬件看门狗将物理重置CPU,并用于确保系统长时间不挂起。
There is no way to receive a warning that this will happen in software because it's assumed that all software has failed. 由于假定所有软件都已失败,因此无法收到将在软件中发生的警告。
If you need a solution that detects that a process is no longer responding, you should make that separate from the watchdog. 如果您需要一种检测到进程不再响应的解决方案,则应将其与看门狗分开。
See the answers to this question for something similar: Designing a monitor process for monitoring and restarting processes 有关类似问题,请参见此问题的答案: 设计用于监视和重新启动过程的监视过程
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.