简体   繁体   English

如何自动重启因OOM而被杀死的systemd服务

[英]How to automatically restart systemd service that is killed due to OOM

How do I automatically restart a systemd service that is killed due to OOM.如何自动重启因 OOM 而终止的 systemd 服务。

I have added a restart but I am not sure if this would work on OOMs I cannot reproduce the OOM on my local dev box so knowing this works would be helpful.我已经添加了重新启动,但我不确定这是否适用于 OOM 我无法在我的本地开发箱上重现 OOM 所以知道这个工作会有所帮助。

[Service]
Restart=on-failure
RestartSec=1s

Error:错误:

Main process exited, code=killed, status=9/KILL

Reading the docs https://www.freedesktop.org/software/systemd/man/systemd.service.html looks like the restart happens on unclean exit code and I think status 9 would come under it, but please can someone validate my thinking.阅读文档https://www.freedesktop.org/software/systemd/man/systemd.service.html看起来重启发生在不干净的退出代码上,我认为状态 9 会在它下面,但请有人验证我的想法.

When a process terminates, the nature of its termination is made available to its parent process of record.当一个进程终止时,其终止的性质可用于其记录的父进程。 For services started by systemd, the parent is systemd.对于由 systemd 启动的服务,父级是 systemd。

The available alternatives that can be reported are termination because of a signal (and which) or normal termination (and the accompanying exit status).可以报告的可用替代方案是由于信号(和哪个)或正常终止(以及伴随的退出状态)而终止。 By "normal" I mean the complement of "killed by a signal", not necessarily "clean" or "successful". “正常”是指“被信号杀死”的补充,不一定是“干净”或“成功”。

The system interfaces for process management do not provide any other options, but systemd also itself provides for applying a timeout to or using a watchdog timer with services it manages, which can lead to service termination on account of one of those expiring (as systemd accounts it).进程管理的系统接口不提供任何其他选项,但 systemd 本身也提供了对它管理的服务应用超时或使用看门狗定时器的功能,这可能导致服务因其中一个过期(如 systemd 帐户)而终止它)。

The systemd documentation of the behavior of the various Restart settings provides pretty good detail on which termination circumstances lead to restart with which Restart settings. 各种Restart设置行为的systemd 文档提供了关于哪种终止情况导致使用哪种Restart设置重新启动的非常详细的信息。 Termination because of a SIGKILL is what the message presented in the question shows, and this would fall into the "unclean signal" category, as systemd defines that.由于SIGKILL而终止是问题中显示的消息,这将属于“不干净信号”类别,如 systemd 所定义的那样。 Thus, following the docs, configuring a service's Restart property to be any of always , on-failure , on-abnormal , or on-abort would result in systemd automatically restarting that service if it terminates because of a SIGKILL .因此,按照文档,将服务的Restart属性配置为alwayson-failureon-abnormalon-abort中的任何一个都会导致 systemd 在服务因SIGKILL而终止时自动重启该服务。

Most of those options will also produce automatic restarts under other circumstances as well, but on-abort will yield automatic restarts only in the event of termination because of unclean signal.大多数这些选项也会在其他情况下产生自动重启,但on-abort只会在由于不干净信号而终止的情况下产生自动重启。 (Note that although systemd considers SIGKILL unclean, it considers SIGTERM clean.) (请注意,尽管 systemd 认为SIGKILL是不干净的,但它认为SIGTERM是干净的。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM