简体繁体 English

Kubernetes pod eviction 将被驱逐的 pod 安排到已经在 DiskPressure 下的节点

[英]Kubernetes pod eviction schedules evicted pod to node already under DiskPressure

原文 2019-05-03 02:54:01 7 1 kubernetes

We are running a kubernetes (1.9.4) cluster with 5 masters and 20 worker nodes.我们正在运行一个具有 5 个主节点和 20 个工作节点的 kubernetes (1.9.4) 集群。 We are running one statefulset pod with replication 3 among other pods in this cluster.我们正在此集群中的其他 pod 中运行一个带有复制 3 的 statefulset pod。 Initially the statefulset pods are distributed to 3 nodes.最初，statefulset pod 分发到 3 个节点。 However the pod-2 on node-2 got evicted due to the disk pressure on node-2.然而，由于节点 2 上的磁盘压力，节点 2 上的 pod-2 被逐出。 However, when the pod-2 is evicted it went to node-1 where pod-1 was already running and node-1 was already experiencing node pressure.但是，当 pod-2 被驱逐时，它会转到 node-1，其中 pod-1 已经在运行，而 node-1 已经承受了节点压力。 As per our understanding, the kubernetes-scheduler should not have scheduled a pod (non critical) to a node where there is already disk pressure.根据我们的理解，kubernetes-scheduler 不应该将 pod（非关键）调度到已经存在磁盘压力的节点。 Is this the default behavior to not schedule the pods to a node under disk pressure or is it allowed.这是在磁盘压力下不将 pod 调度到节点的默认行为还是允许的。 The reason is, at the same time we do observe, node-0 without any disk issue.原因是，同时我们确实观察到 node-0 没有任何磁盘问题。 So we were hoping that evicted pod on node-2 should have ideally come on node-0 instead of node-1 which is under disk pressure.因此，我们希望 node-2 上被驱逐的 pod 理想情况下应该出现在 node-0 上，而不是出现在磁盘压力下的 node-1 上。

Another observation we had was, when the pod-2 on node-2 was evicted, we see that same pod is successfully scheduled and spawned and moved to running state in node-1.我们的另一个观察结果是，当 node-2 上的 pod-2 被驱逐时，我们看到同一个 pod 被成功调度并生成并移动到 node-1 中的运行状态。 However we still see "Failed to admit pod" error in node-2 for many times for the same pod-2 that was evicted.但是，对于被驱逐的同一个 pod-2，我们仍然在 node-2 中多次看到“无法承认 pod”错误。 Is this any issue with the kube-scheduler.这是 kube-scheduler 的任何问题吗？

1 个解决方案

Yes, Scheduler should not assign a new pod to a node with a DiskPressure Condition.是的，调度程序不应将新 pod 分配给具有 DiskPressure Condition 的节点。

However, I think you can approach this problem from few different angles.但是，我认为您可以从几个不同的角度来解决这个问题。

Look into configuration of your scheduler:查看调度程序的配置：
- ./kube-scheduler --write-config-to kube-config.yaml

and check it needs any adjustments.并检查它是否需要任何调整。 You can find info about additional options for kube-scheduler here :您可以在此处找到有关 kube-scheduler 的其他选项的信息：

You can also configure aditional scheduler(s) depending on your needs.您还可以根据需要配置其他调度程序。 Tutorial for that can be found here可以在这里找到教程
Check the logs:检查日志：
- kubeclt logs : kube-scheduler events logs kubeclt logs ：kube-scheduler 事件日志
- journalctl -u kubelet : kubelet logs journalctl -u kubelet : kubelet 日志
- /var/log/kube-scheduler.log (on the master) /var/log/kube-scheduler.log （在主服务器上）
Look more closely at Kubelet's Eviction Thresholds (soft and hard) and how much node memory capacity is set.更仔细地查看 Kubelet 的 Eviction Thresholds（软和硬）以及设置了多少节点内存容量。
Bear in mind that:请记住：
- Kubelet may not observe resources pressure fast enough or Kubelet 可能无法足够快地观察到资源压力或
- Kubelet may evict more Pods than needed due to stats collection timing gap由于统计信息收集时间间隔，Kubelet 可能会驱逐比需要更多的 Pod