当 Pod 通常需要低 CPU 但会定期扩展时，如何使用 K8S HPA 和 autoscaler

Question

I am trying to determine a reliable setup to use with K8S to scale one of my deployments using an HPA and an autoscaler.我正在尝试确定与 K8S 一起使用的可靠设置，以使用 HPA 和自动缩放器扩展我的一个部署。 I want to minimize the amount of resources overcommitted but allow it to scale up as needed.我想尽量减少过度使用的资源量，但允许它根据需要扩展。

I have a deployment that is managing a REST API service.我有一个管理 REST API 服务的部署。 Most of the time the service will have very low usage (0m-5m cpu).大多数情况下，该服务的使用率非常低（0m-5m cpu）。 But periodically through the day or week it will spike to much higher usage on the order of 5-10 CPUs (5000m-10000m).但在一天或一周内，它会周期性地飙升至 5-10 个 CPU (5000m-10000m) 的更高使用率。

My initial pass as configuring this is:我最初的配置是：

Deployment: 1 replica部署：1 个副本

"resources": {
   "requests": {
     "cpu": 0.05
   },
   "limits": {
      "cpu": 1.0
   }
}

HPA:高压钠灯：

"spec": {
   "maxReplicas": 25,
   "metrics": [
      {
         "resource": {
         "name": "cpu",
         "target": {
            "averageValue": 0.75,
            "type": "AverageValue"
         }
         },
         "type": "Resource"
      }
   ],
   "minReplicas": 1,
   ...
}

This is running on an AWS EKS cluster with autoscaler running.这是在运行自动扩缩器的 AWS EKS 集群上运行的。 All instances have 2 CPUs.所有实例都有 2 个 CPU。 The goal is that as the CPU usage goes up the HPA will allocate a new pod that will be unschedulable and then the autoscaler will allocate a new node.目标是随着 CPU 使用率的上升，HPA 将分配一个不可调度的新 pod，然后自动缩放器将分配一个新节点。 As I add load on the service, the CPU usage for the first pod spikes up to approximately 90-95% at max.当我在服务上增加负载时，第一个 pod 的 CPU 使用率最高会飙升至大约 90-95%。

I am running into two related problems:我遇到了两个相关的问题：

Small request size小请求大小

By using such a small request value (cpu: 0.05), the newly requested pods can be easily scheduled on the current node even when it is under high load.通过使用这么小的请求值（cpu：0.05），即使在高负载下，新请求的 Pod 也可以轻松地调度到当前节点上。 Thus the autoscaler never find a pod that can't be scheduled and doesn't allocate a new node.因此，自动缩放器永远不会找到无法调度且不会分配新节点的 pod。 I could increase the small request size and overcommit, but this then means that for the vast majority of the time when there is no load I will be wasting resources I don't need.我可以增加较小的请求大小并过度使用，但这意味着在绝大多数没有负载的情况下，我将浪费我不需要的资源。

Average CPU reduces as more pods are allocated随着分配更多 pod，平均 CPU 减少

Because the pods all get allocated on the same node, once a new pod is allocated it starts sharing the node's available 2 CPUs.因为 pod 都被分配在同一个节点上，所以一旦分配了一个新的 pod，它就会开始共享该节点的 2 个可用 CPU。 This in turn reduces the amount of CPU used by the pod and thus keeps the average value below the 75% target.这反过来又减少了 pod 使用的 CPU 量，从而使平均值保持在 75% 的目标以下。

(ex: 3 pods, 2 CPUs ==> max 66% Average CPU usage per pod) （例如：3 个 pod，2 个 CPU ==> 最大 66% 每个 pod 的平均 CPU 使用率）

I am looking for guidance here on how I should be thinking about this problem.我在这里寻找关于我应该如何考虑这个问题的指导。 I think I am missing something simple.我想我错过了一些简单的东西。

My current thought is that what I am looking for is a way for the Pod resource request value to increase under heavier load and then decrease back down when the system doesn't need it.我目前的想法是，我正在寻找一种方法，让 Pod 资源请求值在较重的负载下增加，然后在系统不需要时减少。 That would point me toward using something like a VPA, but everything I have read says that using HPA and VPA at the same time leads to very bad things .这将使我倾向于使用 VPA 之类的东西，但我所阅读的所有内容都表明同时使用 HPA 和 VPA 会导致非常糟糕的事情。

I think increasing the request from 0.05 to something like 0.20 would probably let me handle the case of scaling up.我认为将请求从 0.05 增加到 0.20 可能会让我处理扩大规模的情况。 But this will in turn waste a lot of resources and could suffer issues if the scheduler find space on an existing pod.但这反过来会浪费大量资源，如果调度程序在现有 pod 上找到空间，可能会遇到问题。 My example is about one service but there are many more services in the production deployment.我的示例是关于一项服务，但在生产部署中还有更多服务。 I don't want to have nodes sitting empty with committed resources but no usage.我不想让节点空着提交资源但没有使用。

What is the best path forward here?最好的前进道路是什么？

Answer 1

Sounds like you need a Scheduler that take actual CPU utilization into account.听起来您需要一个将实际 CPU 利用率考虑在内的调度程序。 This is not supported yet.尚不支持此功能。

There seem to be work on a this feature: KEP - Trimaran: Real Load Aware Scheduling using TargetLoadPackin plugin .似乎有关于此功能的工作： KEP - Trimaran: Real Load Aware Scheduling using TargetLoadPackin plugin 。 Also see New scheduler priority for real load average and free memory .另请参阅实际负载平均和免费 memory 的新调度程序优先级。

In the meanwhile, if the CPU limit is 1 Core, and the Nodes autoscale under high CPU utilization, it sounds like it should work if the nodes is substantially bigger than the CPU limits for the pods .同时，如果 CPU 限制为 1 个核心，并且节点在 CPU 利用率高的情况下自动缩放，听起来如果节点远大于 pod 的 CPU 限制，它应该可以工作。 Eg try with nodes that has 4 Cores or more and possibly slightly larger CPU request for the Pod?例如，尝试使用具有 4 个或更多内核的节点，并且可能对 Pod 的CPU 请求稍大一些？

当 Pod 通常需要低 CPU 但会定期扩展时，如何使用 K8S HPA 和 autoscaler

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-03-30 23:40:52

当 Pod 通常需要低 CPU 但会定期扩展时，如何使用 K8S HPA 和 autoscaler

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-03-30 23:40:52

解决方案1
2 已采纳 2021-03-30 23:40:52