停止 Python 容器很慢 - SIGTERM 未传递给 python 进程？

Question

I made a simple python webserver based on this example , which runs inside Docker我基于此示例制作了一个简单的 python 网络服务器，它在 Docker 内部运行

FROM python:3-alpine
WORKDIR /app

COPY entrypoint.sh .
RUN chmod +x entrypoint.sh

COPY src src
CMD ["python", "/app/src/api.py"]
ENTRYPOINT ["/app/entrypoint.sh"]

Entrypoint:入口点：

#!/bin/sh
echo starting entrypoint
set -x
exec "$@"

Stopping the container took very long, altough the exec statement with the JSON array syntax should pass it to the python process.停止容器需要很长时间，尽管使用 JSON 数组语法的exec语句应该将其传递给 python 进程。 I assumed a problem with SIGTERM no being passed to the container.我假设没有将SIGTERM传递给容器的问题。 I added the following to my api.py script to detect SIGTERM我将以下内容添加到我的api.py脚本中以检测SIGTERM

def terminate(signal,frame):
  print("TERMINATING")

if __name__ == "__main__":
    signal.signal(signal.SIGTERM, terminate)

    webServer = HTTPServer((hostName, serverPort), MyServer)
    print("Server started http://%s:%s" % (hostName, serverPort))
    webServer.serve_forever()

Executed without Docker python3 api/src/api.py , I tried在没有 Docker python3 api/src/api.py的情况下执行，我试过了

kill -15 $(ps -guaxf | grep python | grep -v grep | awk '{print $2}')

to send SIGTERM ( 15 is the number code of it ).发送SIGTERM （ 15 是它的数字代码）。 The script prints TERMINATING , so my event handler works.该脚本打印TERMINATING ，因此我的事件处理程序有效。 Now I run the Docker container using docker-compose and press CTRL + C.现在我使用 docker-compose 运行 Docker 容器，然后按 CTRL + C。 Docker says gracefully stopping... (press Ctrl+C again to force) but doesn't print my terminating message from the event handler. Docker 说优雅地停止...（再次按 Ctrl+C 强制）但不从事件处理程序打印我的终止消息。

I also tried to run docker-compose in detached mode, then run docker-compose kill -s SIGTERM api and view the logs.我还尝试在分离模式下运行 docker-compose，然后运行docker-compose kill -s SIGTERM api并查看日志。 Still no message from the event handler.仍然没有来自事件处理程序的消息。

Answer 1

Docker runs your application, per default, in foreground, so, as PID 1, this said, the process with the PID 1 as a special meaning and specific protections in Linux. Docker 默认情况下在前台运行您的应用程序，因此，作为 PID 1，这就是说，PID 1 的进程在 Linux 中具有特殊含义和特定保护。

This is highlighted in docker run documentation:这在docker run文档中突出显示：

Note笔记

A process running as PID 1 inside a container is treated specially by Linux: it ignores any signal with the default action. Linux 对容器内作为 PID 1 运行的进程进行特殊处理：它忽略任何具有默认操作的信号。 As a result, the process will not terminate on SIGINT or SIGTERM unless it is coded to do so.因此，进程不会在SIGINT或SIGTERM上终止，除非它被编码为这样做。

^{Source: https://docs.docker.com/engine/reference/run/#foreground}^{资料来源： https://docs.docker.com/engine/reference/run/#foreground}

In order to fix this, you can run the container, in a single container mode, with the flag --init of docker run :为了解决这个问题，您可以使用docker run的标志--init在单容器模式下运行容器：

You can use the --init flag to indicate that an init process should be used as the PID 1 in the container.您可以使用--init标志来指示应该将一个 init 进程用作容器中的 PID 1。 Specifying an init process ensures the usual responsibilities of an init system, such as reaping zombie processes, are performed inside the created container.指定一个 init 进程可确保 init 系统的通常职责，例如收获僵尸进程，在创建的容器内执行。

^{Source: https://docs.docker.com/engine/reference/run/#specify-an-init-process}^{资料来源： https://docs.docker.com/engine/reference/run/#specify-an-init-process}

The same configuration is possible in docker-compose , simply by specifying init: true on the container.在docker-compose中可以进行相同的配置，只需在容器上指定init: true即可。

An example would be:一个例子是：

version: "3.8"
services:
  web:
    image: alpine:latest
    init: true

^{Source: https://docs.docker.com/compose/compose-file/#init}^{来源： https://docs.docker.com/compose/compose-file/#init}

Answer 2

Since the script runs as pid 1 as desired and setting init: true in docker-compose.yml doesn't seem to change anything, I took a deeper drive in this topic.由于脚本根据需要以 pid 1 运行并在docker-compose.yml中设置init: true似乎没有任何改变，因此我对这个主题进行了更深入的驱动。 This leads me figuring out multiple mistakes I did:这让我找出了我犯的多个错误：

Logging日志记录

The approach of printing a message when SIGTERM is catched was designed as simple test case to see if this basically works before I care about stopping the server. SIGTERM时打印消息的方法被设计为简单的测试用例，以在我关心停止服务器之前查看这是否基本有效。 But I noticed that no message appears for two reasons:但我注意到没有消息出现有两个原因：

Output buffering Output 缓冲

When running a long term process in python like the HTTP server (or any while True loop for example), there is no output displayed when starting the container attached with docker-compose up (no -d flag). When running a long term process in python like the HTTP server (or any while True loop for example), there is no output displayed when starting the container attached with docker-compose up (no -d flag). To receive live logs, we need to start python with the -u flag or set the env variable PYTHONUNBUFFERED=TRUE .要接收实时日志，我们需要使用-u标志启动 python 或设置环境变量PYTHONUNBUFFERED=TRUE 。

No log piping after stop停止后没有日志管道

But the main problem was not the output buffering (this is just a notice since I wonder why there was no log output from the container).但主要问题不是 output 缓冲（这只是一个通知，因为我想知道为什么容器中没有日志 output）。 When canceling the container, docker-compose stops piping logs to the console.取消容器时， docker-compose停止将日志管道传输到控制台。 This means that from a logical perspective it can't display anything that happens AFTER CTRL + C is pressed.这意味着从逻辑角度来看，它无法显示在按下 CTRL + C 之后发生的任何事情。

To fetch those logs, we need to wait until docker-compose has stopped the container and run docker-compose logs .要获取这些日志，我们需要等到docker-compose停止容器并运行docker-compose logs 。 It will print all, including those generated after CTRL + C is pressed.它将打印所有内容，包括按下 CTRL + C 后生成的内容。 Using docker-compose logs I found out that SIGTERM is passed to the container and my event handler works.使用docker-compose logs我发现SIGTERM被传递给容器并且我的事件处理程序有效。

Stopping the webserver停止网络服务器

With those knowledge I tried to stop the webserver instance.有了这些知识，我试图停止网络服务器实例。 First this doesn't work because it's not enough to just call webServer.server_close() .首先，这不起作用，因为仅调用webServer.server_close()是不够的。 Its required to exit explicitely after any cleanup work is done like this:在完成任何清理工作后，它需要显式退出，如下所示：

def terminate(signal,frame):
  print("Start Terminating: %s" % datetime.now())
  webServer.server_close()
  sys.exit(0)

When sys.exit() is not called, the process keeps running which results in ~10s waiting time before Docker kills it.当不调用sys.exit()时，进程会继续运行，这会导致在 Docker 杀死它之前等待大约 10 秒的时间。

Full working example完整的工作示例

Here a demo script that implement everything I've learned:这是一个演示脚本，它实现了我学到的所有内容：

from http.server import BaseHTTPRequestHandler, HTTPServer
import signal
from datetime import datetime
import sys, os

hostName = "0.0.0.0"
serverPort = 80

class MyServer(BaseHTTPRequestHandler):
  def do_GET(self):
    self.send_response(200)
    self.send_header("Content-Type", "text/html")
    self.end_headers()
    self.wfile.write(bytes("Hello from Python Webserver", "utf-8"))

webServer = None

def terminate(signal,frame):
  print("Start Terminating: %s" % datetime.now())
  webServer.server_close()
  sys.exit(0)

if __name__ == "__main__":
    signal.signal(signal.SIGTERM, terminate)

    webServer = HTTPServer((hostName, serverPort), MyServer)
    print("Server started http://%s:%s with pid %i" % ("0.0.0.0", 80, os.getpid()))
    webServer.serve_forever()

Running in a container, it could be stopped very fast without waiting for Docker to kill the process:在容器中运行，它可以非常快地停止，而无需等待 Docker 杀死进程：

$ docker-compose up --build -d
$ time docker-compose down
Stopping python-test_app_1 ... done
Removing python-test_app_1 ... done
Removing network python-test_default

real    0m1,063s
user    0m0,424s
sys     0m0,077s

停止 Python 容器很慢 - SIGTERM 未传递给 python 进程？

问题描述

2 个解决方案

解决方案1
2 2020-07-11 20:29:55

解决方案2
2 2020-07-13 08:00:31

Logging日志记录

Output buffering Output 缓冲

No log piping after stop停止后没有日志管道

Stopping the webserver停止网络服务器

Full working example完整的工作示例

停止 Python 容器很慢 - SIGTERM 未传递给 python 进程？

问题描述

2 个解决方案

解决方案1 2 2020-07-11 20:29:55

解决方案2 2 2020-07-13 08:00:31

Logging日志记录

Output buffering Output 缓冲

No log piping after stop停止后没有日志管道

Stopping the webserver停止网络服务器

Full working example完整的工作示例

解决方案1
2 2020-07-11 20:29:55

解决方案2
2 2020-07-13 08:00:31