简体   繁体   English

具有自动缩放功能的 GKE 节点池不会缩减

[英]GKE node pool with Autoscaling does not scale down

I have a GKE cluster with two nodepools.我有一个带有两个节点池的 GKE 集群。 I turned on autoscaling on one of my nodepools but it does not seem to automatically scale down.我在我的一个节点池上打开了自动缩放,但它似乎没有自动缩小。

启用自动缩放

I have enabled HPA and that works fine.我已经启用了 HPA,而且效果很好。 It scales the pods down to 1 when I don't see traffic.当我看不到流量时,它会将 pod 缩小到 1。

The API is currently not getting any traffic so I would expect the nodes to scale down as well. API 目前没有获得任何流量,因此我希望节点也可以缩减。

But it still runs the maximum 5 nodes despite some nodes using less than 50% of allocatable memory/CPU.但它仍然运行最多 5 个节点,尽管一些节点使用的可分配内存/CPU 不到 50%。

5个节点

What did I miss here?我在这里错过了什么? I am planning to move these pods to bigger machines but to do that I need the node autoscaling to work to control the monthly cost.我计划将这些 pod 移到更大的机器上,但要做到这一点,我需要节点自动缩放来控制每月的成本。

There are many reasons that can cause CA to not be downscaling successfully.有许多原因会导致 CA 无法成功缩减。 If we resume how this should work normally it will be something like this:如果我们恢复它应该如何正常工作,它将是这样的:

  • Cluster autoscaler will periodically check (every 10 seconds) utilization of the nodes.集群自动缩放器将定期检查(每 10 秒)节点的利用率。
  • If the utilization factor is less than 0.5 the node will be considered as under utilization.如果利用率因子小于 0.5,则节点将被视为利用率不足。
  • Then the nodes will be marked for removal and will be monitored for next 10 mins to make sure the utilization factor stays less than 0.5.然后节点将被标记为删除,并将在接下来的 10 分钟内进行监控,以确保利用率保持小于 0.5。
  • If even after 10 mins it stays under utilized then the node would be removed by cluster autoscaler.如果即使在 10 分钟后它仍然没有被利用,那么该节点将被集群自动缩放器删除。

If above is not being accomplished, then something else is preventing your nodes to be downscaling.如果上面没有完成,那么其他事情正在阻止您的节点缩小规模。 In my experience PDBs needs to be applied to kube-system pods and I would say that could be the reason why;根据我的经验,PDB 需要应用于 kube-system pod,我想说这可能是原因; however, there are many reasons why this can be happening, here are reasons that can cause downscaling issues:但是,发生这种情况的原因有很多,以下是可能导致缩减问题的原因:

1. PDB is not applied to your kube-system pods. 1. PDB 不适用于您的 kube-system pod。 Kube-system pods prevent Cluster Autoscaler from removing nodes on which they are running. Kube 系统 pod 会阻止 Cluster Autoscaler 删除运行它们的节点。 You can manually add Pod Disruption Budget(PDBs) for the kube-system pods that can be safely rescheduled elsewhere, this can be added with next command:您可以为 kube-system pod 手动添加 Pod Disruption Budget (PDBs),这些 pod 可以在其他地方安全地重新调度,这可以使用下一个命令添加:

`kubectl create poddisruptionbudget PDB-NAME --namespace=kube-system --selector app=APP-NAME --max-unavailable 1`

2. Containers using local storage (volumes), even empty volumes. 2. 使用本地存储(卷)的容器,甚至是空卷。 Kubernetes prevents scale down events on nodes with pods using local storage. Kubernetes 可防止在具有使用本地存储的 pod 的节点上发生缩减事件。 Look for this kind of configuration that prevents Cluster Autoscaler to scale down nodes.寻找这种阻止 Cluster Autoscaler 缩减节点的配置。

3. Pods annotated with cluster-autoscaler.kubernetes.io/safe-to-evict: true . 3. 使用cluster-autoscaler.kubernetes.io/safe-to-evict: true注释的 Pod。 Look for pods with this annotation that can be preventing Nodes scaledown查找带有此注释的 pod,它可以防止节点缩减

4. Nodes annotated with cluster-autoscaler.kubernetes.io/scale-down-disabled: true . 4. 使用cluster-autoscaler.kubernetes.io/scale-down-disabled: true注释的节点。 Look for Nodes with this annotation that can be preventing cluster Autoscale.查找带有此注释的节点,这些节点可能会阻止集群自动缩放。 These configurations are the ones I will suggest you check on, in order to make your cluster to be scaling down nodes that are under utilized.这些配置是我建议您检查的配置,以使您的集群能够缩减未充分利用的节点。 ----- -----

Also you can see this page where explains the configuration to prevent the downscales, which can be what is happening to you.您还可以看到页面,其中解释了防止降级的配置,这可能是发生在您身上的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM