简体   繁体   English

HPA 缩放,即使当前 CPU 低于目标 CPU

[英]HPA Scaling even though Current CPU is below Target CPU

I am playing around with the Horizontal Pod Autoscaler in Kube.netes.我正在 Kube.netes 中使用 Horizontal Pod Autoscaler。 I've set the HPA to start up new instances once the average CPU Utilization passes 35%.我已将 HPA 设置为在平均 CPU 利用率超过 35% 时启动新实例。 However this does not seem to work as expected.但是,这似乎没有按预期工作。 The HPA triggers a rescale even though the CPU Utilization is far below the defined target utilization.即使 CPU 利用率远低于定义的目标利用率,HPA 也会触发重新调整。 As seen below the "current" utilization is 10% which is far away from 35%.如下所示,“当前”利用率为 10%,与 35% 相去甚远。 But still, it rescaled the number of pods from 5 to 6.但是,它仍然将 pod 的数量从 5 个重新调整为 6 个。 在此处输入图像描述

I've also checked the metrics in my Google Cloud Platform dashboard (the place at which we host the application).我还检查了我的 Google Cloud Platform 仪表板(我们托管应用程序的地方)中的指标。 This also shows me that the requested CPU utilization hasn't surpassed the threshold of 35%.这也告诉我请求的 CPU 利用率没有超过 35% 的阈值。 But still, several rescales occurred.但是,仍然发生了几次重新缩放。 在此处输入图像描述

The content of my HPA我的 HPA 的内容

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
 name: django
spec:
{{ if eq .Values.env "prod" }}
 minReplicas: 5
 maxReplicas: 35
{{ else if eq .Values.env "staging" }}
 minReplicas: 1
 maxReplicas: 3
{{ end }}
 scaleTargetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: django-app
 targetCPUUtilizationPercentage: 35

Does anyone know what the cause of this might be?有谁知道这可能是什么原因?

Scaling is based on % of requests not limits .缩放基于requests的百分比而不是limits I think we should change this answer as the examples in the accepted answer show:我认为我们应该更改此答案,因为已接受答案中的示例显示:

 limits:
   cpu: 1000m

But the targetCPUUtilizationPercentage is based on requests like:targetCPUUtilizationPercentage基于如下requests

requests:
   cpu: 1000m

For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler.对于每个 pod 的资源指标(如 CPU),controller 从资源指标 API 中获取 HorizontalPodAutoscaler 所针对的每个 Pod 的指标。 Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod.然后,如果设置了目标利用率值,则 controller 将利用率值计算为每个 Pod 中容器上等效资源请求的百分比。 If a target raw value is set, the raw metric values are used directly.如果设置了目标原始值,则直接使用原始指标值。 The controller then takes the mean of the utilization or the raw value (depending on the type of target specified) across all targeted Pods, and produces a ratio used to scale the number of desired replicas.然后 controller 取所有目标 Pod 的利用率或原始值(取决于指定的目标类型)的平均值,并生成用于缩放所需副本数量的比率。

https://kube.netes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work https://kube.netes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work

This is tricky and can be a bug, but I don't think so, most of time people configure too low values as I'll explain.这很棘手,可能是一个错误,但我不这么认为,大多数时候人们配置的值太低,我将解释。

How targetCPUUtilizationPercentage relates to Pod's request limits. targetCPUUtilizationPercentage如何与 Pod 的请求限制相关。

The targetCPUUtilizationPercentage configures a percentage based on all the CPU a pod can use. targetCPUUtilizationPercentage配置基于 pod 可以使用的所有 CPU 的百分比。 On Kubernetes we can't create an HPA without specifying some limits to CPU usage.在 Kubernetes 上,如果不对 CPU 使用率指定一些limits ,我们就无法创建HPA

Let's assume that this is our limits:让我们假设这是我们的极限:

apiVersion: v1
kind: Pod
metadata:
  name: apache
spec:
  containers:
    - name: apache
      image: httpd:alpine
      resources:
        limits:
          cpu: 1000m

And in our targetCPUUtilizationPercentage inside HPA we specify 75%.在 HPA 内的targetCPUUtilizationPercentage中,我们指定 75%。

That is easy to explain because we ask for 100% (1000m = 1 CPU core) of a single core, so when this core is about 75% of use, HPA will start to work.这很容易解释,因为我们要求单核的 100% (1000m = 1 个 CPU 核) ,所以当这个核大约使用 75% 时,HPA 就会开始工作。

But if we define our limits as this:但是,如果我们将我们的限制定义为:

spec:
  containers:
    - name: apache
      image: httpd:alpine
      resources:
        limits:
          cpu: 500m

Now, 100% of CPU our pod can utilize is only 50% of a single core.现在,我们的 pod 可以利用的 100% CPU 只是单个内核的 50%。 Fine, so 100% of cpu usage from this pod means, on hardware, 50% usage of a single core.很好,所以这个 pod 100% 的 cpu 使用率意味着,在硬件上,单核的使用率是 50%。

This is indifferent for targetCPUUtilizationPercentage , if we keep our value of 75% the HPA will start to work when our single core is about 37.5% usage, because this is 75% of all CPU this pod can consume.这对于targetCPUUtilizationPercentage ,如果我们保持75%的值,HPA 将在我们的单核使用率约为37.5%时开始工作,因为这是该pod可以消耗的所有 CPU 的 75%。

From the perspective of a pod/hpa, they never know that they are limited on CPU or memory.从 pod/hpa 的角度来看,他们永远不知道自己受限于 CPU 或 memory。

Understanding the scenario in the question above了解上述问题中的场景

With some programs like the one used in the question above - the CPU spikes do occur - however only in small timeframes (for example 10 second spikes) .对于上面问题中使用的某些程序 - CPU 峰值确实会发生- 但是仅在很小的时间范围内(例如 10 秒峰值) Due to the short duration of these spikes the metric server doesn't save this spike, but only saves the metric after a 1m window.由于这些尖峰的持续时间很短,指标服务器不会保存这个尖峰,而只会在 1m window 之后保存指标。 In such cases the spike in between such windows will be excluded.在这种情况下,windows 之间的尖峰将被排除。 This explains why the spike cannot be seen in the metrics dashboards, but is picked up by the HPA.这解释了为什么在指标仪表板中看不到峰值,但 HPA 会发现峰值。

Thus, for services with low cpu limits a larger scale-up time window ( scaleUp settings in HPA) can be ideal.因此,对于具有低 cpu 限制的服务,window (HPA 中的scaleUp设置)可能是理想的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM