无法删除所有 Kubernetes 命名空间 Cronjob 中所有被驱逐的 pod

Question

我的 Kubernetes 集群具有我需要修复的 memory 压力限制（稍后）。

有时从几个被驱逐的豆荚到几十个不等。 我创建了一个 Cronjob 规范来清理被驱逐的 pod。 我在里面测试了命令，它在 powershell 中运行良好。

但是，我是否在规范中指定命名空间并不重要，将其部署到存在的每个命名空间，脚本似乎并没有删除我驱逐的 pod。

原始脚本：

---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: delete-evicted-pods
spec:
  schedule: "*/30 * * * *"
  failedJobsHistoryLimit: 1
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: kubectl-runner
            image: bitnami/kubectl:latest
            command: ["sh", "-c", "kubectl get pods --all-namespaces --field-selector 'status.phase==Failed' -o json | kubectl delete -f -"]
          restartPolicy: OnFailure

我尝试使用关联的 RBAC 创建脚本，但也没有运气。

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: development
  name: cronjob-runner
rules:
- apiGroups:
  - extensions
  - apps
  resources:
  - deployments
  verbs:
  - 'patch'

---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cronjob-runner
  namespace: development
subjects:
- kind: ServiceAccount
  name: sa-cronjob-runner
  namespace: development
roleRef:
  kind: Role
  name: cronjob-runner
  apiGroup: ""

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sa-cronjob-runner
  namespace: development
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: delete-all-failed-pods
spec:
  schedule: "*/30 * * * *"
  failedJobsHistoryLimit: 1
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: sa-cronjob-runner
          containers:
          - name: kubectl-runner
            image: bitnami/kubectl:latest
            command: 
              - /bin/sh
              - -c
              - kubectl get pods --all-namespaces --field-selector 'status.phase==Failed' -o json | kubectl delete -f -
          restartPolicy: OnFailure

我意识到我应该定义更好的 memory 限制，但是在我将 k8s 从 1.14 升级到 1.16 之前，这个功能是有效的。

我做错了什么或遗漏了什么？ 如果有帮助，我将在 Azure (AKS) 中运行。

Answer 1

升级后听起来像这样：

kubectl get pods --all-namespaces --field-selector 'status.phase==Failed'`

不再拾取失败的 pod。 它可能是：

kubectl/apiserver 版本不匹配
凭据/服务帐户权限
（？）

您可以尝试运行调试 pod 来验证：

$ kubectl run -i --tty --rm debug --image=bitnami/kubectl:latest --restart=Never -- get pods --all-namespaces --field-selector 'status.phase==Failed'

Kubernetes 中的每个作业都会创建一个 Pod，因此您还可以查看kubectl-runner pod 的日志：

kubectl logs kubectl-runner-xxxxx

更新：

根据日志文件， default:default服务帐户没有足够的权限，这将解决它：

kubectl create clusterrolebinding myadmin-binding --clusterrole=cluster-admin --serviceaccount=default:default

但是，如果您想限制更多，则必须创建一个更有限的 ClusterRole 或 Role（如果您希望它限制在命名空间中）

Answer 2

您的角色需要更改为ClusterRole ，因为您在 kubectl 命令中使用了--all-namespaces

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cronjob-runner
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "watch", "list"]

您拥有的RoleBinding用于development命名空间中的服务帐户sa-cronjob-runner 。 但是您正在运行的 cron 实际上是在default命名空间中。 因此它使用default命名空间中的default服务帐户。

所以要么在 cronjob 和 serviceAccountName 中指定命名空间development serviceAccountName: sa-cronjob-runner

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: delete-evicted-pods
  namespace: development
spec:
  schedule: "*/30 * * * *"
  failedJobsHistoryLimit: 1
  successfulJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: sa-cronjob-runner
          containers:
          - name: kubectl-runner
            image: bitnami/kubectl:latest
            command: ["sh", "-c", "kubectl get pods --all-namespaces --field-selector 'status.phase==Failed' -o json | kubectl delete -f -"]
          restartPolicy: OnFailure

或者更改角色绑定以将 ClusterRole 绑定到default命名空间中的default服务帐户

---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cronjob-runner
  namespace: development
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
roleRef:
  kind: Role
  name: cronjob-runner
  apiGroup: rbac.authorization.k8s.io

无法删除所有 Kubernetes 命名空间 Cronjob 中所有被驱逐的 pod

问题描述

2 个解决方案

解决方案1
4 2020-06-30 19:45:41

解决方案2
2 2020-07-01 05:30:27

无法删除所有 Kubernetes 命名空间 Cronjob 中所有被驱逐的 pod

问题描述

2 个解决方案

解决方案1 4 2020-06-30 19:45:41

解决方案2 2 2020-07-01 05:30:27

解决方案1
4 2020-06-30 19:45:41

解决方案2
2 2020-07-01 05:30:27