简体   繁体   English

如何删除 Kubernetes 'shutdown' pod

[英]How to remove Kubernetes 'shutdown' pods

I recently noticed a big accumulation of pods with status 'Shutdown'.我最近注意到大量具有“关闭”状态的 pod。 We have been using Kubernetes since October, 2020.自 2020 年 10 月以来,我们一直在使用 Kubernetes。

Production and staging is running on the same nodes except that staging uses preemtible nodes to cut the cost.生产和登台运行在相同的节点上,除了登台使用抢占节点来降低成本。 The containers are also stable in staging.容器在暂存时也很稳定。 (Failures occur rarely as they are caught in testing before). (失败很少发生,因为它们在之前的测试中被发现)。

Service provider Google Cloud Kubernetes.服务提供商 Google Cloud Kubernetes。

I familiarized myself with the docs and tried searching however neither I recognize neither google helps with this particular status.我熟悉了文档并尝试搜索,但是我发现谷歌都没有帮助解决这个特定的状态。 There are no errors in the logs.日志中没有错误。

一堆关闭 pod 的示例 吊舱描述只说失败

I have no problem pods being stopped.我没有问题停止豆荚。 Ideally I'd like K8s to automatically delete these shutdown pods.理想情况下,我希望 K8s 自动删除这些关闭的 pod。 If I run kubectl delete po redis-7b86cdccf9-zl6k9 , it goes away in a blink.如果我运行kubectl delete po redis-7b86cdccf9-zl6k9 ,它会瞬间消失。

kubectl get pods | grep Shutdown | awk '{print $1}' | xargs kubectl delete pod kubectl get pods | grep Shutdown | awk '{print $1}' | xargs kubectl delete pod is manual temporary workaround. kubectl get pods | grep Shutdown | awk '{print $1}' | xargs kubectl delete pod是手动临时解决方法。

PS. PS。 k is an alias to kubectl in my environment. k在我的环境中是kubectl的别名。

Final example: it happens across all namespaces // different containers.最后一个例子:它发生在所有命名空间 // 不同的容器中。 在此处输入图像描述

I stumbled upon few related issues explaining the status https://github.com/kubernetes/website/pull/28235 https://github.com/kubernetes/kubernetes/issues/102820我偶然发现了一些解释状态的相关问题https://github.com/kubernetes/website/pull/28235 https://github.com/kubernetes/kubernetes/issues/102820

"When pods were evicted during the graceful node shutdown, they are marked as failed. Running kubectl get pods shows the status of the the evicted pods as Shutdown ." “当 pod 在正常节点关闭期间被驱逐时,它们被标记为失败。运行kubectl get pods将被驱逐的 pod 的状态显示为Shutdown 。”

The evicted pods are not removed on purpose, as k8s team says here 1 , the evicted pods are nor removed in order to be inspected after eviction.被驱逐的 pod 不会被故意移除,正如 k8s 团队在此处所说的1 ,被驱逐的 pod 也不会被移除以便在驱逐后进行检查。

I believe here the best approach would be to create a cronjob 2 as already mentioned.我相信这里最好的方法是创建一个已经提到的 cronjob 2

apiVersion: batch/v1
kind: CronJob
metadata:
  name: del-shutdown-pods
spec:
  schedule: "* 12 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - kubectl get pods | grep Shutdown | awk '{print $1}' | xargs kubectl delete pod
          restartPolicy: OnFailure

You don't need any grep - just use selectors that kubectl provides.您不需要任何grep - 只需使用 kubectl 提供的选择器。 And, btw, you cannot call kubectl from the busybox image, because it doesn't have kubectl at all.而且,顺便说一句,您不能从busybox 映像中调用kubectl,因为它根本没有kubectl I also created a service account with the right of pods deletion.我还创建了一个具有 pod 删除权限的服务帐户。

apiVersion: batch/v1
kind: CronJob
metadata:
  name: del-shutdown-pods
spec:
  schedule: "0 */2 * * *"  
  concurrencyPolicy: Replace
  jobTemplate:
    metadata:
      name: shutdown-deleter
    spec:
      template:
        spec:
          serviceAccountName: deleter
          containers:
          - name: shutdown-deleter
            image: bitnami/kubectl
            imagePullPolicy: IfNotPresent
            command:
              - "/bin/sh"
            args:
              - "-c"
              - "kubectl delete pods --field-selector status.phase=Failed -A --ignore-not-found=true"
          restartPolicy: Never

First, try to forcefully delete the kubernetes pod using the below command:首先,尝试使用以下命令强制删除 kubernetes pod:

$ kubectl delete pod <pod_name> -n --grace-period 0 --force $ kubectl 删除 pod <pod_name> -n --grace-period 0 --force

You can directly delete the pod using the below command:您可以使用以下命令直接删除 pod:

$ kubectl delete pod $ kubectl 删除 pod

Then, check the status of the pod using the below command:然后,使用以下命令检查 pod 的状态:

$ kubectl get pods $ kubectl 获取 pod

Here, you will see that pods have been deleted.在这里,您将看到 pod 已被删除。

You can also verify using the documentation in the yaml file also.您还可以使用 yaml 文件中的文档进行验证。

Most programs gracefully shut down when receiving a SIGTERM, but if you are using third-party code or are managing a system you don't have control over, the preStop hook is a great way to trigger a graceful shutdown without modifying the application.大多数程序在收到 SIGTERM 时会正常关闭,但如果您使用第三方代码或正在管理您无法控制的系统,preStop 挂钩是无需修改应用程序即可触发正常关闭的好方法。 Kubernetes will send a SIGTERM signal to the containers in the pod. Kubernetes 将向 pod 中的容器发送 SIGTERM 信号。 At this point, Kubernetes waits for a specified time called the termination grace period.此时,Kubernetes 会等待一段称为终止宽限期的指定时间。

For more information refer .有关更多信息, 请参阅

Right now Kubernetes doesn't remove evicted and shutdown status pods by default.现在 Kubernetes 默认不会删除被驱逐和关闭状态的 Pod。 We also faced a similar kind of issue in our environment.我们在环境中也面临着类似的问题。

As an automatic fix, you can create a Kubernetes cronjob which can remove the pod with evicted and shutdown status.作为一个自动修复,您可以创建一个 Kubernetes cronjob,它可以删除具有驱逐和关闭状态的 pod。 The Kubernetes cronjob pod can authenticate using the serviceaccount and RBAC where you can restrict the verbs and namespaces for your utility. Kubernetes cronjob pod 可以使用 serviceaccount 和 RBAC 进行身份验证,您可以在其中限制实用程序的动词和命名空间。

您可以使用https://github.com/hjacobs/kube-janitor 。这提供了各种可配置的选项来清理

My take on this problem looks something like this (inspiration from other solutions here):我对这个问题的看法是这样的(来自其他解决方案的灵感):

# Delete all shutdown pods. This is common problem on kubernetes using preemptible nodes on gke
# why awk, not failed pods: https://github.com/kubernetes/kubernetes/issues/54525#issuecomment-340035375
# due fact failed will delete evicted pods, that will complicate pod troubleshooting

---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: del-shutdown-pods
  namespace: kube-system
  labels:
    app: shutdown-pod-cleaner
spec:
  schedule: "*/1 * * * *"
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            app: shutdown-pod-cleaner
        spec:
          volumes:
          - name: scripts
            configMap:
              name: shutdown-pods-scripts
              defaultMode: 0777
          serviceAccountName: shutdown-pod-sa
          containers:
          - name: zombie-killer
            image: bitnami/kubectl
            imagePullPolicy: IfNotPresent
            command:
              - "/bin/sh"
            args:
              - "-c"
              - "/scripts/podCleaner.sh"
            volumeMounts:
              - name: scripts
                mountPath: "/scripts"
                readOnly: true
          restartPolicy: OnFailure
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: shutdown-pod-cleaner
  namespace: kube-system
  labels:
    app: shutdown-pod-cleaner
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["delete", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: shutdown-pod-cleaner-cluster
  namespace: kube-system
subjects:
- kind: ServiceAccount
  name: shutdown-pod-sa
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: shutdown-pod-cleaner
  apiGroup: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: shutdown-pod-sa
  namespace: kube-system
  labels:
    app: shutdown-pod-cleaner
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: shutdown-pods-scripts
  namespace: kube-system
  labels:
    app: shutdown-pod-cleaner
data:
  podCleaner.sh: |
    #!/bin/sh
    if [ $(kubectl get pods --all-namespaces --ignore-not-found=true | grep Shutdown | wc -l) -ge 1 ]
    then
    kubectl get pods -A | grep Shutdown | awk '{print $1,$2}' | xargs -n2 sh -c 'kubectl delete pod -n $0 $1 --ignore-not-found=true'
    else
    echo "no shutdown pods to clean"
    fi

I just set up a cronjob to clean the dead GKE pods.我刚刚设置了一个 cronjob 来清理死掉的 GKE pod。 Complete setup includes RBAC role, role binding, and a service account.完整的设置包括 RBAC 角色、角色绑定和服务帐户。

Service account and cluster role setup.服务帐户和集群角色设置。

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-accessor-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "delete", "watch", "list"]
---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: pod-access
subjects:
- kind: ServiceAccount
  name: cronjob-svc
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: pod-accessor-role
  apiGroup: rbac.authorization.k8s.io

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cronjob-svc
  namespace: kube-system

Cronjob to clean up dead pods. Cronjob 清理死掉的 pod。

apiVersion: batch/v1
kind: CronJob
metadata:
  name: pod-cleaner-cron
  namespace: kube-system
spec:
  schedule: "0 */12 * * *"
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        metadata:
          name: pod-cleaner-cron
          namespace: kube-system
        spec:
          serviceAccountName: cronjob-svc
          restartPolicy: Never
          containers:
          - name: pod-cleaner-cron
            imagePullPolicy: IfNotPresent
            image: bitnami/kubectl
            command:
              - "/bin/sh"
            args:
              - "-c"
              - "kubectl delete pods --field-selector status.phase=Failed -A --ignore-not-found=true"
status: {}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM