设置“所需大小：0”是否会阻止 cluster-autoscaler 扩展托管节点组？

Question

I have an aws managed node group that is acting unexpectedly when I set both desired size and minimum size to 0. I would expect that the managed node group would not have any nodes to start with, but that once I attempt to schedule a pod using a nodeSelector with the label eks.amazonaws.com/nodegroup: my-node-group-name , the cluster-autoscaler would set the desired size for the managed node group to 1, and a node would be booted.当我将所需大小和最小大小都设置为 0 时，我有一个 aws 托管节点组出现意外行为。我希望托管节点组不会有任何节点开始，但是一旦我尝试使用带有 label eks.amazonaws.com/nodegroup: my-node-group-name的 nodeSelector，cluster-autoscaler 会将托管节点组的所需大小设置为 1，然后将启动一个节点。

However, the cluster-autoscaler logs indicate that the pending pod does not trigger a scale up because it wouldn't be schedulable: pod didn't trigger scale-up (it wouldn't fit if a new node is added) .然而，cluster-autoscaler 日志表明挂起的 pod 不会触发扩展，因为它不可调度： pod didn't trigger scale-up (it wouldn't fit if a new node is added) 。 When I go set desired size to 1 in the managed node group manually however, the pod is scheduled successfully, so I know the nodeSelector works fine.但是，当我 go 在托管节点组中手动将所需大小设置为 1 时，pod 已成功调度，因此我知道 nodeSelector 工作正常。

I thought this might be a labelling issue, as described here : , but I have the labels on my managed node groups set to be auto-discoverable.我认为这可能是一个标签问题，如此处所述：，但我将托管节点组上的标签设置为可自动发现。

    spec:
      containers:
      - command:
        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --namespace=kube-system
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster-name
        - --balance-similar-node-groups=true
        - --expander=least-waste
        - --logtostderr=true
        - --skip-nodes-with-local-storage=false
        - --skip-nodes-with-system-pods=false
        - --stderrthreshold=info
        - --v=4

I have set the same labels on the autoscaling group:我在自动缩放组上设置了相同的标签：

Key                                             Value                             Tag new instances
eks:cluster-name                                 my-cluster-name                   Yes
eks:nodegroup-name                               my-node-group-name                Yes
k8s.io/cluster-autoscaler/enabled                true                              Yes
k8s.io/cluster-autoscaler/my-cluster-name        owned                             Yes
kubernetes.io/cluster/my-cluster-name             owned                            Yes

Am I missing something?我错过了什么吗？ Or is this expected behavior for setting desired size to 0?或者这是将所需大小设置为 0 的预期行为？

Answer 1

Ugh, it turns out this is just an aws incompatibility with the cluster-autoscaler that they don't tell you about.呃，事实证明这只是 aws 与他们没有告诉你的集群自动缩放器的不兼容。 You can scale your managed node group down to zero, but without a workaround, you can't scale it back up.您可以将托管节点组缩小到零，但如果没有变通办法，您将无法将其向上扩展。

For the cluster-autoscaler to scale up a node group from 0, it constructs a pseudo node based on the nodegroup specifications, in this case the aws autoscaling group.为了使 cluster-autoscaler 从 0 开始扩展节点组，它会根据节点组规范构建一个伪节点，在本例中为 aws autoscaling 组。 For the cluster-autoscaler to know what labels to put on that pseudo node to check if it would allow a pod to be scheduled, you need to add a specific tag to the nodegroup .为了让 cluster-autoscaler 知道在该伪节点上放置什么标签以检查它是否允许调度 pod，您需要将特定标签添加到 nodegroup 。

Sadly, aws does not add this tag to the autoscaling group for you, and also does not propagate tags from the managed node group to the autoscaling group.遗憾的是，aws 不会为您将此标签添加到自动缩放组，也不会将标签从托管节点组传播到自动缩放组。 The only way to make this work is to go add the tag to the autoscaling group yourself after it was created by the managed node group.使这项工作有效的唯一方法是 go 在托管节点组创建标签后，自行将标签添加到自动缩放组。 The issue is tracked here .该问题在此处进行了跟踪。

Answer 2

EKS now supports this with Cluster Autoscaler. EKS 现在通过 Cluster Autoscaler 支持这一点。 https://realz.medium.com/reduce-amazon-eks-cost-by-scaling-node-groups-to-zero-41dce9db50ef https://realz.medium.com/reduce-amazon-eks-cost-by-scaling-node-groups-to-zero-41dce9db50ef

设置“所需大小：0”是否会阻止 cluster-autoscaler 扩展托管节点组？

问题描述

2 个解决方案

解决方案1
4 已采纳 2021-07-14 20:51:42

解决方案2
0 2022-11-29 06:25:18

设置“所需大小：0”是否会阻止 cluster-autoscaler 扩展托管节点组？

问题描述

2 个解决方案

解决方案1 4 已采纳 2021-07-14 20:51:42

解决方案2 0 2022-11-29 06:25:18

解决方案1
4 已采纳 2021-07-14 20:51:42

解决方案2
0 2022-11-29 06:25:18