简体繁体 English

Kubernetes 节点 CPU 利用率

[英]Kubernetes node CPU utilization

原文 2020-07-29 21:56:53 3 1 kubernetes/ autoscaling/ aws-auto-scaling/ hpa

I'm trying(learning) to figure out the best way to utilize CPU (and RAM) on k8s nodes .我正在尝试（学习）找出在 k8s节点上利用 CPU（和 RAM）的最佳方法。 My final goal is to make sure CPU utilization on each node in the cluster is above X%我的最终目标是确保集群中每个节点的 CPU 利用率高于 X%

Till now I've read about cluster-autoscaler and HPA , but not sure if they'd help me with the use case.到目前为止，我已经阅读了有关cluster-autoscaler和HPA的信息，但不确定它们是否会帮助我解决用例。

From what I've read:从我读到的：

cluster-autoscaler is used to autoscale nodes based on a comparison between replica count and resources.request Vs available CPU on the target ec2 instance - which is NOT based on traffic/actual CPU usage cluster-autoscaler用于根据副本数和resources.request之间的比较来自动缩放节点。请求与目标 ec2 实例上的可用 CPU 对比——这不是基于流量/实际 CPU 使用率
HPA is based on CPU/actual cpu usage, but for individual pods HPA基于 CPU/实际 CPU 使用率，但针对单个 pod

I essentially wanna get to a point where kubectl top nodes would show all nodes are using > X% (let's say 60%) - and ideally trigger the autoscaling if we reach X2% (let's say 80%)我基本上想达到kubectl top nodes将显示所有节点正在使用 > X%（比如说 60%）的地步——如果我们达到 X2%（比如说 80%），理想情况下会触发自动缩放

any suggestion/pointer on how to go about this use case?关于如何 go 关于这个用例的任何建议/指针？ (or I should somehow use the combination of these 2 autoscaling mechanisms) （或者我应该以某种方式使用这两种自动缩放机制的组合）

1 个解决方案

You can a combination of the HPA or/and Cluster autoscaler and/or the cloud providers' autoscaling group.您可以组合使用 HPA 或/和集群自动缩放器和/或云提供商的自动缩放组。

HPA based on CPU/Memory of your pods and scale up and down your K8s Deployments for example.例如，基于 pod 的 CPU/内存的 HPA 并扩展和缩减 K8s 部署。
Cloud provider ASG or autoscaling group.云提供商 ASG 或自动缩放组。 Using the VMs or instances based and you can scale up and down based on their own CPU and memory metrics.使用基于虚拟机或实例，您可以根据自己的 CPU 和 memory 指标进行扩展和缩减。
Cluster autoscaler.集群自动扩缩器。 It works when pods are pending and they have nowhere to run, but if you are handling the case above this is more of a safe fail mechanism or perhaps for workloads that don't require to come up very quickly.当 pod 处于挂起状态并且它们无处可运行时，它可以工作，但如果您正在处理上述情况，这更像是一种安全的故障机制，或者可能用于不需要很快出现的工作负载。

In summary, you can use all 3 above (or less) but you have to see what works for you so that they don't conflict with each other.总之，您可以使用以上所有 3 个（或更少），但您必须查看哪些对您有用，以免它们相互冲突。 One potential problem is that when your cloud ASG starts scaling down then you also have pods in pending state then your cluster autoscaler (if you have it enabled) will kick in and you may have both of them trying to do the opposite causing your cluster to just not being able to schedule any pod.一个潜在的问题是，当您的云 ASG 开始缩减时，您还有待处理的 state 中的 pod，然后您的集群自动扩缩器（如果您启用了它）将启动，您可能会让它们都试图做相反的事情，导致集群只是无法安排任何吊舱。

✌️☮️ ✌️☮️