简体   繁体   English

基于健康检查重启不健康的docker容器

[英]Restarting an unhealthy docker container based on healthcheck

I am using Docker version 17.09.0-ce , and I see that containers are marked as unhealthy.我使用的是Docker version 17.09.0-ce ,我看到容器被标记为不健康。 Is there an option to get the container restart instead of keeping the container as unhealthy?有没有让容器重新启动而不是让容器保持不健康状态的选项?

Restarting of unhealty container feature was in the original PR ( https://github.com/moby/moby/pull/22719 ), but was removed after a discussion and considered to be done later as enhancement of RestartPolicy.重新启动 unhealty 容器功能在原始 PR ( https://github.com/moby/moby/pull/22719 ) 中,但在讨论后被删除,并考虑稍后作为 RestartPolicy 的增强完成。

At this moment you can use this workaround to automatically restarting unhealty containers: https://hub.docker.com/r/willfarrell/autoheal/此时您可以使用此解决方法来自动重新启动不健康的容器: https ://hub.docker.com/r/willfarrell/autoheal/

Here is a sample compose file:这是一个示例撰写文件:

version: '2'
services:
  autoheal:
    restart: always
    image: willfarrell/autoheal
    environment:
      - AUTOHEAL_CONTAINER_LABEL=all
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Simply execute docker-compose up -d on this只需在此执行docker-compose up -d

You can restart automatically an unhealthy container by setting a smart HEALTHCHECK and a proper restart policy.您可以通过设置智能 HEALTHCHECK 和适当的重启策略来自动重启不健康的容器。

The Docker restart policy should be one of always or unless-stopped . Docker 重启策略应该是alwaysunless-stopped

The HEALTHCHECK instead should implement a logic that kills the container when it's unhealthy. HEALTHCHECK 应该实现一个逻辑,当容器不健康时杀死容器。

In the following example I used curl with its internal retry mechanism and piped it (in case of failure/service unhealthy) to the kill command.在下面的示例中,我使用curl及其内部重试机制并将其(在失败/服务不健康的情况下)通过管道传递给kill命令。

HEALTHCHECK --interval=5m --timeout=2m --start-period=45s \
   CMD curl -f --retry 6 --max-time 5 --retry-delay 10 --retry-max-time 60 "http://localhost:8080/health" || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'

The important step to understand here is that the retry logic is self-contained in the curl command, the Docker retry here actually is mandatory but useless.这里要理解的重要一步是, curl命令中的重试逻辑是自包含的,这里的Docker重试实际上是强制性的但没用。 Then if the curl HTTP request fails 3 times, then kill is executed.然后如果curl HTTP 请求失败 3 次,则执行kill First it sends a SIGTERM to all the processes in the container, to allow them to gracefully stop, then after 10 seconds it sends a SIGKILL to completely kill all the processes in the container.首先它向容器中的所有进程发送一个 SIGTERM,以允许它们正常停止,然后在 10 秒后发送一个 SIGKILL 以完全终止容器中的所有进程。 It must be noted that when the PID1 of a container dies, then the container itself dies and the restart policy is invoked.必须注意的是,当容器的 PID1 死亡时,容器本身也会死亡并调用重启策略。

Gotchas: kill behaves differently in bash than in sh.陷阱: kill在 bash 中的行为与在 sh 中的行为不同。 In bash you can use -1 to signal all the processes with PID greater than 1 to die.在 bash 中,您可以使用-1来通知所有 PID 大于 1 的进程死亡。

For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script.对于独立容器,Docker 没有本地集成来在健康检查失败时重新启动容器,尽管我们可以使用 Docker 事件和脚本来实现相同的功能。 Health check is better integrated with Swarm.健康检查更好地与 Swarm 集成。 With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.通过将健康检查集成到 Swarm 中,当服务中的容器不健康时,Swarm 会自动关闭不健康的容器并启动一个新容器,以保持服务副本计数中指定的容器计数。

You can try put in your Dockerfile something like this:您可以尝试在您的 Dockerfile 中添加如下内容:

HEALTHCHECK --interval=5s --timeout=2s CMD curl --fail http://localhost || kill 1

Don't forget --restart always option.不要忘记--restart always选项。

kill 1 will kill process with pid 1 in container and force container exit. kill 1将终止容器中 pid 为 1 的进程并强制容器退出。 Usually the process started by CMD or ENTRYPOINT has pid 1.通常由 CMD 或 ENTRYPOINT 启动的进程的 pid 为 1。

Unfortunally, this method likely don't change container's state to unhealthy, so be careful with it.不幸的是,此方法可能不会将容器的状态更改为不健康,因此请小心使用。

Docker has a couple of ways to get details on container health. Docker 有几种方法可以获取有关容器运行状况的详细信息。 You can configure health checks and how often they run.您可以配置运行状况检查及其运行频率。 Also, health checks can be run on applications running inside a container, like http (this would use curl --fail option.) You can view the health_status event to get details.此外,可以在容器内运行的应用程序上运行健康检查,例如 http(这将使用curl --fail选项。)您可以查看health_status事件以获取详细信息。

For detailed information on an unhealthy container the inspect command comes in handy, docker inspect --format='{{json .State.Health}}' container-name (see https://blog.newrelic.com/2016/08/24/docker-health-check-instruction/ for more details.)有关不健康容器的详细信息,inspect 命令派上用场, docker inspect --format='{{json .State.Health}}' container-name (参见https://blog.newrelic.com/2016/08/ 24/docker-health-check-instruction/了解更多详情。)

You should resolve the error condition causing the "unhealthy" tag (anytime the health check command runs and gets an exit code of 1) first.您应该首先解决导致“不健康”标签的错误条件(任何时候运行健康检查命令并获得退出代码 1)。 This may or may not require that Docker restart the container, depending on the error.这可能需要也可能不需要 Docker 重新启动容器,具体取决于错误。 If you are starting/restarting your containers automatically , then either trapping the start errors or logging them and the health check status can help address errors quickly.如果您正在自动启动/重新启动容器,那么捕获启动错误或记录它们和健康检查状态可以帮助快速解决错误。 Check the link if you are interested in auto start.如果您对自动启动感兴趣,请查看链接。

According to https://codeblog.dotsandbrackets.com/docker-health-check/根据https://codeblog.dotsandbrackets.com/docker-health-check/

Create container and add " restart: always".创建容器并添加“重新启动:始终”。

In the use of healthcheck, pay attention to the following points:在使用healthcheck时,要注意以下几点:

For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script.对于独立容器,Docker 没有本地集成来在健康检查失败时重新启动容器,尽管我们可以使用 Docker 事件和脚本来实现相同的功能。 Health check is better integrated with Swarm.健康检查更好地与 Swarm 集成。 With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.通过将健康检查集成到 Swarm 中,当服务中的容器不健康时,Swarm 会自动关闭不健康的容器并启动一个新容器,以保持服务副本计数中指定的容器计数。

我可以通过使用--force-recreate选项来解决这个错误

docker-compose up --force-recreate <service>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM