如何删除 Kubernetes 'shutdown' pod

Question

我最近注意到大量具有“关闭”状态的 pod。 自 2020 年 10 月以来，我们一直在使用 Kubernetes。

生产和登台运行在相同的节点上，除了登台使用抢占节点来降低成本。 容器在暂存时也很稳定。 （失败很少发生，因为它们在之前的测试中被发现）。

服务提供商 Google Cloud Kubernetes。

我熟悉了文档并尝试搜索，但是我发现谷歌都没有帮助解决这个特定的状态。 日志中没有错误。

我没有问题停止豆荚。 理想情况下，我希望 K8s 自动删除这些关闭的 pod。 如果我运行kubectl delete po redis-7b86cdccf9-zl6k9 ，它会瞬间消失。

kubectl get pods | grep Shutdown | awk '{print $1}' | xargs kubectl delete pod kubectl get pods | grep Shutdown | awk '{print $1}' | xargs kubectl delete pod是手动临时解决方法。

PS。 k在我的环境中是kubectl的别名。

最后一个例子：它发生在所有命名空间 // 不同的容器中。

我偶然发现了一些解释状态的相关问题https://github.com/kubernetes/website/pull/28235 https://github.com/kubernetes/kubernetes/issues/102820

“当 pod 在正常节点关闭期间被驱逐时，它们被标记为失败。运行kubectl get pods将被驱逐的 pod 的状态显示为Shutdown 。”

Answer 1

被驱逐的 pod 不会被故意移除，正如 k8s 团队在此处所说的1 ，被驱逐的 pod 也不会被移除以便在驱逐后进行检查。

我相信这里最好的方法是创建一个已经提到的 cronjob 2 。

apiVersion: batch/v1
kind: CronJob
metadata:
  name: del-shutdown-pods
spec:
  schedule: "* 12 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - kubectl get pods | grep Shutdown | awk '{print $1}' | xargs kubectl delete pod
          restartPolicy: OnFailure

Answer 2

您不需要任何grep - 只需使用 kubectl 提供的选择器。 而且，顺便说一句，您不能从busybox 映像中调用kubectl，因为它根本没有kubectl 。 我还创建了一个具有 pod 删除权限的服务帐户。

apiVersion: batch/v1
kind: CronJob
metadata:
  name: del-shutdown-pods
spec:
  schedule: "0 */2 * * *"  
  concurrencyPolicy: Replace
  jobTemplate:
    metadata:
      name: shutdown-deleter
    spec:
      template:
        spec:
          serviceAccountName: deleter
          containers:
          - name: shutdown-deleter
            image: bitnami/kubectl
            imagePullPolicy: IfNotPresent
            command:
              - "/bin/sh"
            args:
              - "-c"
              - "kubectl delete pods --field-selector status.phase=Failed -A --ignore-not-found=true"
          restartPolicy: Never

Answer 3

首先，尝试使用以下命令强制删除 kubernetes pod：

$ kubectl 删除 pod <pod_name> -n --grace-period 0 --force

您可以使用以下命令直接删除 pod：

$ kubectl 删除 pod

然后，使用以下命令检查 pod 的状态：

$ kubectl 获取 pod

在这里，您将看到 pod 已被删除。

您还可以使用 yaml 文件中的文档进行验证。

大多数程序在收到 SIGTERM 时会正常关闭，但如果您使用第三方代码或正在管理您无法控制的系统，preStop 挂钩是无需修改应用程序即可触发正常关闭的好方法。 Kubernetes 将向 pod 中的容器发送 SIGTERM 信号。 此时，Kubernetes 会等待一段称为终止宽限期的指定时间。

有关更多信息，请参阅。

Answer 4

现在 Kubernetes 默认不会删除被驱逐和关闭状态的 Pod。 我们在环境中也面临着类似的问题。

作为一个自动修复，您可以创建一个 Kubernetes cronjob，它可以删除具有驱逐和关闭状态的 pod。 Kubernetes cronjob pod 可以使用 serviceaccount 和 RBAC 进行身份验证，您可以在其中限制实用程序的动词和命名空间。

Answer 5

您可以使用https://github.com/hjacobs/kube-janitor 。这提供了各种可配置的选项来清理

Answer 6

我对这个问题的看法是这样的（来自其他解决方案的灵感）：

# Delete all shutdown pods. This is common problem on kubernetes using preemptible nodes on gke
# why awk, not failed pods: https://github.com/kubernetes/kubernetes/issues/54525#issuecomment-340035375
# due fact failed will delete evicted pods, that will complicate pod troubleshooting

---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: del-shutdown-pods
  namespace: kube-system
  labels:
    app: shutdown-pod-cleaner
spec:
  schedule: "*/1 * * * *"
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            app: shutdown-pod-cleaner
        spec:
          volumes:
          - name: scripts
            configMap:
              name: shutdown-pods-scripts
              defaultMode: 0777
          serviceAccountName: shutdown-pod-sa
          containers:
          - name: zombie-killer
            image: bitnami/kubectl
            imagePullPolicy: IfNotPresent
            command:
              - "/bin/sh"
            args:
              - "-c"
              - "/scripts/podCleaner.sh"
            volumeMounts:
              - name: scripts
                mountPath: "/scripts"
                readOnly: true
          restartPolicy: OnFailure
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: shutdown-pod-cleaner
  namespace: kube-system
  labels:
    app: shutdown-pod-cleaner
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["delete", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: shutdown-pod-cleaner-cluster
  namespace: kube-system
subjects:
- kind: ServiceAccount
  name: shutdown-pod-sa
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: shutdown-pod-cleaner
  apiGroup: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: shutdown-pod-sa
  namespace: kube-system
  labels:
    app: shutdown-pod-cleaner
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: shutdown-pods-scripts
  namespace: kube-system
  labels:
    app: shutdown-pod-cleaner
data:
  podCleaner.sh: |
    #!/bin/sh
    if [ $(kubectl get pods --all-namespaces --ignore-not-found=true | grep Shutdown | wc -l) -ge 1 ]
    then
    kubectl get pods -A | grep Shutdown | awk '{print $1,$2}' | xargs -n2 sh -c 'kubectl delete pod -n $0 $1 --ignore-not-found=true'
    else
    echo "no shutdown pods to clean"
    fi

Answer 7

我刚刚设置了一个 cronjob 来清理死掉的 GKE pod。 完整的设置包括 RBAC 角色、角色绑定和服务帐户。

服务帐户和集群角色设置。

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-accessor-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "delete", "watch", "list"]
---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: pod-access
subjects:
- kind: ServiceAccount
  name: cronjob-svc
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: pod-accessor-role
  apiGroup: rbac.authorization.k8s.io

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cronjob-svc
  namespace: kube-system

Cronjob 清理死掉的 pod。

apiVersion: batch/v1
kind: CronJob
metadata:
  name: pod-cleaner-cron
  namespace: kube-system
spec:
  schedule: "0 */12 * * *"
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        metadata:
          name: pod-cleaner-cron
          namespace: kube-system
        spec:
          serviceAccountName: cronjob-svc
          restartPolicy: Never
          containers:
          - name: pod-cleaner-cron
            imagePullPolicy: IfNotPresent
            image: bitnami/kubectl
            command:
              - "/bin/sh"
            args:
              - "-c"
              - "kubectl delete pods --field-selector status.phase=Failed -A --ignore-not-found=true"
status: {}

如何删除 Kubernetes 'shutdown' pod

问题描述

7 个解决方案

解决方案1
7 已采纳 2021-09-09 11:51:46

解决方案2
7 2021-10-18 13:18:43

解决方案3
0 2021-07-15 10:59:39

解决方案4
0 2021-09-08 05:47:36

解决方案5
0 2021-09-08 07:31:12

解决方案6
0 2022-02-06 11:12:48

解决方案7
0 2022-07-05 16:10:29

如何删除 Kubernetes &#39;shutdown&#39; pod

问题描述

7 个解决方案

解决方案1 7 已采纳 2021-09-09 11:51:46

解决方案2 7 2021-10-18 13:18:43

解决方案3 0 2021-07-15 10:59:39

解决方案4 0 2021-09-08 05:47:36

解决方案5 0 2021-09-08 07:31:12

解决方案6 0 2022-02-06 11:12:48

解决方案7 0 2022-07-05 16:10:29

如何删除 Kubernetes 'shutdown' pod

解决方案1
7 已采纳 2021-09-09 11:51:46

解决方案2
7 2021-10-18 13:18:43

解决方案3
0 2021-07-15 10:59:39

解决方案4
0 2021-09-08 05:47:36

解决方案5
0 2021-09-08 07:31:12

解决方案6
0 2022-02-06 11:12:48

解决方案7
0 2022-07-05 16:10:29