简体   繁体   English

Kubernetes pod eviction 将被驱逐的 pod 安排到已经在 DiskPressure 下的节点

[英]Kubernetes pod eviction schedules evicted pod to node already under DiskPressure

We are running a kubernetes (1.9.4) cluster with 5 masters and 20 worker nodes.我们正在运行一个具有 5 个主节点和 20 个工作节点的 kubernetes (1.9.4) 集群。 We are running one statefulset pod with replication 3 among other pods in this cluster.我们正在此集群中的其他 pod 中运行一个带有复制 3 的 statefulset pod。 Initially the statefulset pods are distributed to 3 nodes.最初,statefulset pod 分发到 3 个节点。 However the pod-2 on node-2 got evicted due to the disk pressure on node-2.然而,由于节点 2 上的磁盘压力,节点 2 上的 pod-2 被逐出。 However, when the pod-2 is evicted it went to node-1 where pod-1 was already running and node-1 was already experiencing node pressure.但是,当 pod-2 被驱逐时,它会转到 node-1,其中 pod-1 已经在运行,而 node-1 已经承受了节点压力。 As per our understanding, the kubernetes-scheduler should not have scheduled a pod (non critical) to a node where there is already disk pressure.根据我们的理解,kubernetes-scheduler 不应该将 pod(非关键)调度到已经存在磁盘压力的节点。 Is this the default behavior to not schedule the pods to a node under disk pressure or is it allowed.这是在磁盘压力下不将 pod 调度到节点的默认行为还是允许的。 The reason is, at the same time we do observe, node-0 without any disk issue.原因是,同时我们确实观察到 node-0 没有任何磁盘问题。 So we were hoping that evicted pod on node-2 should have ideally come on node-0 instead of node-1 which is under disk pressure.因此,我们希望 node-2 上被驱逐的 pod 理想情况下应该出现在 node-0 上,而不是出现在磁盘压力下的 node-1 上。

Another observation we had was, when the pod-2 on node-2 was evicted, we see that same pod is successfully scheduled and spawned and moved to running state in node-1.我们的另一个观察结果是,当 node-2 上的 pod-2 被驱逐时,我们看到同一个 pod 被成功调度并生成并移动到 node-1 中的运行状态。 However we still see "Failed to admit pod" error in node-2 for many times for the same pod-2 that was evicted.但是,对于被驱逐的同一个 pod-2,我们仍然在 node-2 中多次看到“无法承认 pod”错误。 Is this any issue with the kube-scheduler.这是 kube-scheduler 的任何问题吗?

Yes, Scheduler should not assign a new pod to a node with a DiskPressure Condition.是的,调度程序不应将新 pod 分配给具有 DiskPressure Condition 的节点。

However, I think you can approach this problem from few different angles.但是,我认为您可以从几个不同的角度来解决这个问题。

  1. Look into configuration of your scheduler:查看调度程序的配置:

    • ./kube-scheduler --write-config-to kube-config.yaml

and check it needs any adjustments.并检查它是否需要任何调整。 You can find info about additional options for kube-scheduler here :您可以在此处找到有关 kube-scheduler 的其他选项的信息:

  1. You can also configure aditional scheduler(s) depending on your needs.您还可以根据需要配置其他调度程序。 Tutorial for that can be found here可以在这里找到教程

  2. Check the logs:检查日志:

    • kubeclt logs : kube-scheduler events logs kubeclt logs :kube-scheduler 事件日志
    • journalctl -u kubelet : kubelet logs journalctl -u kubelet : kubelet 日志
    • /var/log/kube-scheduler.log (on the master) /var/log/kube-scheduler.log (在主服务器上)
  3. Look more closely at Kubelet's Eviction Thresholds (soft and hard) and how much node memory capacity is set.更仔细地查看 Kubelet 的 Eviction Thresholds(软和硬)以及设置了多少节点内存容量。

  4. Bear in mind that:请记住:

    • Kubelet may not observe resources pressure fast enough or Kubelet 可能无法足够快地观察到资源压力或
    • Kubelet may evict more Pods than needed due to stats collection timing gap由于统计信息收集时间间隔,Kubelet 可能会驱逐比需要更多的 Pod

Please check out my suggestions and let me know if they helped.请查看我的建议,如果它们有帮助,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 state 中的 kubernetes pod 部署时节点上的磁盘压力 - diskpressure on node when deploying using kubernetes pod in pending state 节点有条件:[DiskPressure] 在 azure/aks 中导致 k8s 中的 pod 驱逐 - The node had condition: [DiskPressure] causing pod eviction in k8s in azure/aks Kubernetes:Pod 调度/驱逐与请求/限制的关系 - Kubernetes : Pod scheduling/eviction relationship with requests/limits 在Kubernetes下是否可以将Pod主机名作为托管Node主机名? - Is it possible to have under Kubernetes the Pod hostname be the hosting Node hostname? 政策 pod 在驱逐时开始 - Policy pod start at eviction 从Kubernetes中的Pod重新启动其他节点上的Pod - Restart a pod on different node from a pod in Kubernetes 如何查看部署在 kubernetes 的 Pod 中的服务的日志,其中 Pod 位于 Evicted state - How to check the log of a service deployed in a pod of kubernetes, where the pod is at evicted state 将节点标签注入 Kubernetes pod - Inject node labels into Kubernetes pod Kube.netes 自动缩放的节点和 Pod 锁定 - Node and Pod locking for Kubernetes autoscaling “该节点的资源映像不足”-定期导致容器驱逐 - `the node was low on resource imagefs` — Causing pod eviction on a regular basis
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM