简体   繁体   English

Kubernetes 中单个 pod 中容器终止的控制顺序

[英]Control order of container termination in a single pod in Kubernetes

I have two containers inside one pod.我在一个吊舱内有两个容器。 One is my application container and the second is a CloudSQL proxy container.一个是我的应用程序容器,第二个是 CloudSQL 代理容器。 Basically my application container is dependent on this CloudSQL container.基本上我的应用程序容器依赖于这个 CloudSQL 容器。

The problem is that when a pod is terminated, the CloudSQL proxy container is terminated first and only after some seconds my application container is terminated.问题是,当 Pod 终止时,CloudSQL 代理容器首先终止,并且仅在几秒钟后我的应用程序容器终止。

So, before my container is terminated, it keeps sending requests to the CloudSQL container, resulting in errors:因此,在我的容器终止之前,它不断向 CloudSQL 容器发送请求,从而导致错误:

could not connect to server: Connection refused Is the server running on host "127.0.0.1" and accepting TCP/IP connections on port 5432

That's why, I thought it would be a good idea to specify the order of termination, so that my application container is terminated first and only then the cloudsql one.这就是为什么,我认为指定终止顺序是一个好主意,以便我的应用程序容器首先终止,然后才是 cloudsql 一个。

I was unable to find anything that could do this in the documentation.我在文档中找不到任何可以做到这一点的东西。 But maybe there is some way.但也许有一些方法。

This is not directly possible with the Kubernetes pod API at present.目前,Kubernetes pod API 无法直接实现这一点。 Containers may be terminated in any order.容器可以按任何顺序终止。 The Cloud SQL pod may die more quickly than your application, for example if it has less cleanup to perform or fewer in-flight requests to drain. Cloud SQL pod 可能会比您的应用程序更快地消亡,例如,如果它需要执行的清理工作较少或需要排出的动态请求较少。

From Termination of Pods :Pod 的终止

When a user requests deletion of a pod, the system records the intended grace period before the pod is allowed to be forcefully killed, and a TERM signal is sent to the main process in each container.当用户请求删除某个 Pod 时,系统会在允许强制杀死该 Pod 之前记录预期的宽限期,并向每个容器中的主进程发送一个 TERM 信号。


You can get around this to an extent by wrapping the Cloud SQL and main containers in different entrypoints, which communicate their exit status between each other using a shared pod-level file system.您可以通过将 Cloud SQL 和主容器包装在不同的入口点来在一定程度上解决这个问题,这些入口点使用共享的 pod 级文件系统在彼此之间传达它们的退出状态。

This solution will not work with the 1.16 release of the Cloud SQL proxy ( see comments ) as this release ceased to bundle a shell with the container.此解决方案不适用于 Cloud SQL 代理的 1.16 版本( 请参阅评论),因为此版本不再将 shell 与容器捆绑在一起。 The 1.17 release is now available in Alpine or Debian Buster variants , so this version is now a viable upgrade target which is once again compatible with this solution. 1.17 版本现在在 Alpine 或 Debian Buster 变体中可用,因此该版本现在是一个可行的升级目标,它再次与此解决方案兼容。

A wrapper like the following may help with this:像下面这样的包装器可能会对此有所帮助:

containers:
- command: ["/bin/bash", "-c"]
  args:
  - |
    trap "touch /lifecycle/main-terminated" EXIT
    <your entry point goes here>
  volumeMounts:
  - name: lifecycle
    mountPath: /lifecycle
- name: cloudsql_proxy
  image: gcr.io/cloudsql-docker/gce-proxy
  command: ["/bin/bash", "-c"]
  args:
  - |
    /cloud_sql_proxy <your flags> &
    PID=$!

    function stop {
        while true; do
            if [[ -f "/lifecycle/main-terminated" ]]; then
                kill $PID
            fi
            sleep 1
        done
    }
    trap stop EXIT
    # We explicitly call stop to ensure the sidecar will terminate
    # if the main container exits outside a request from Kubernetes
    # to kill the Pod.
    stop &
    wait $PID
  volumeMounts:
  - name: lifecycle
    mountPath: /lifecycle

You'll also need a local scratch space to use for communicating lifecycle events:您还需要一个本地临时空间来用于通信生命周期事件:

volumes:
- name: lifecycle
  emptyDir:

How does this solution work?这个解决方案是如何工作的? It intercepts in the Cloud SQL proxy container the SIGTERM signal passed by the Kubernetes supervisor to each of your pod's containers on shutdown.它在 Cloud SQL 代理容器中拦截 Kubernetes 主管在关闭时传递给每个 pod 容器的SIGTERM信号。 The "main process" running in that container is a shell, which has spawned a child process running the Cloud SQL proxy.在该容器中运行的“主进程”是一个 shell,它产生了一个运行 Cloud SQL 代理的子进程。 Thus, the Cloud SQL proxy is not immediately terminated.因此,Cloud SQL 代理不会立即终止。 Rather, the shell code blocks waiting for a signal (by simple means of a file appearing in the file system) from the main container that it has successfully exited.相反,shell 代码会阻塞等待来自主容器它已成功退出的信号(通过文件系统中出现的简单方式)。 Only at that point is the Cloud SQL proxy process terminated and the sidecar container returns.只有在那时,Cloud SQL 代理进程才会终止并且 sidecar 容器返回。

Of course, this has no effect on forced termination in the event your containers take too long to shutdown and exceed the configured grace period.当然,如果您的容器关闭时间过长并超过配置的宽限期,这对强制终止没有影响。

The solution depends on the containers you are running having a shell available to them;解决方案取决于您正在运行的容器是否有可用的 shell; this is true of the Cloud SQL proxy (except 1.16, and 1.17 onwards when using the alpine or debian variants), but you may need to make changes to your local container builds to ensure this is true of your own application containers. Cloud SQL 代理确实如此(使用alpinedebian变体时 1.16 和 1.17 之后的版本除外),但您可能需要对本地容器构建进行更改,以确保您自己的应用程序容器也是如此。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM