Kube.netes HPA - 扩大冷却时间

Question

I am running a Kube.netes cluster v1.16(currently newest version on GKE) with HPA that scales the deployments base on custom metrics(Specifically rabbitmq messages count fetched from google cloud monitoring).我正在运行带有 HPA 的 Kube.netes 集群 v1.16（目前是 GKE 上的最新版本），它根据自定义指标（特别是从谷歌云监控中获取的 rabbitmq 条消息计数）来扩展部署。

The Problem问题

The deployments scale up very fast to maximum pod count when the message count is temporarily high.当消息数暂时很高时，部署会非常快速地扩展到最大 pod 数。

Information信息

The HPA --horizontal-pod-autoscaler-sync-period is set to 15 seconds on GKE and can't be changed as far as I know. HPA --horizontal-pod-autoscaler-sync-period 在 GKE 上设置为 15 秒，据我所知无法更改。

My custom metrics are updated every 30 seconds.我的自定义指标每 30 秒更新一次。

I believe that what causes this behavior is that when there is a high message count in the queues every 15 seconds the HPA triggers a scale up and after few cycles it reaches maximum pod capacity.我相信导致这种行为的原因是，当每 15 秒队列中的消息计数很高时，HPA 会触发扩展，并在几个周期后达到最大 pod 容量。

In kube.netes api v1.18 you can control scale up stabilization time, but I can't find a similar feature in v1.16.在 kube.netes api v1.18 中，您可以控制放大稳定时间，但我在 v1.16 中找不到类似的功能。

My Question我的问题

How can I make the HPA scale up more gradually?我怎样才能使 HPA 逐渐扩大？

Edit 1编辑 1

Sample HPA of one of my deployments:我的一个部署的示例 HPA：

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: my-deployment-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
  minReplicas: 6
  maxReplicas: 100
  metrics:
  - type: External
    external:
      metricName: "custom.googleapis.com|rabbit_mq|v1-compare|messages_count"
      metricSelector:
        matchLabels:
          metric.labels.name: production
      targetValue: 500

Answer 1

We've built an open source Custom HPA which is highly configurable.我们构建了一个高度可配置的开源Custom HPA 。
Specifically for you case you can set the HPA to cooldown between scale down events.专门针对您的情况，您可以将 HPA 设置为在缩小事件之间进行冷却。

In order to use the custom HPA all you need to do is:为了使用自定义 HPA，您需要做的就是：

; add nanit helm repo
$ helm repo add nanit https://nanit.github.io/helm-charts

; install the chart in the cluster
helm install nanit/custom-hpa \ 
  --version 1.0.7 \
  --set target.deployment=<deployment> \
  --set target.namespace=<namespace> \
  --set target.value=100 \
  --set minReplicas=10 \
  --set maxReplicas=50 \
  --set behavior.scaleDownCooldown=120 \
  --set prometheus.url=<prometheus-url> \
  --set prometheus.port=<prometheus-port> \
  --set prometheus.query=<prometheus-target-metric>

The setting you are looking for is behavior.scaleDownCooldown which dictates the time in seconds in which the HPA should wait before scaling down again.您正在寻找的设置是behavior.scaleDownCooldown ，它指示 HPA 在再次缩小之前应等待的时间（以秒为单位）。

At the moment the custom HPA only supports prometheus as a metric provider, but you can use a RabbitMQ exporter and set queue_messages_ready as the target metric.目前自定义 HPA 仅支持 prometheus 作为指标提供者，但您可以使用RabbitMQ 导出器并将queue_messages_ready设置为目标指标。

Answer 2

First, a good piece of information to know, is that there is a built-in cooldown in Kube.netes for autoscalers.首先，需要知道的一条很好的信息是，Kube.netes 中有一个用于自动缩放器的内置冷却时间。 Quoting from Kube.netes in Action:引用 Kube.netes in Action：

Currently, a scale-up will occur only if no rescaling event occurred in the last three minutes.目前，只有在过去三分钟内没有发生重新缩放事件时才会进行缩放。 A scale-down event is performed even less frequently—every five minutes.缩减事件的执行频率更低——每五分钟一次。 Keep this in mind so you don't wonder why the autoscaler refuses to perform a rescale operation even if the metrics clearly showthat it should.记住这一点，这样你就不会奇怪为什么自动缩放器拒绝执行重新缩放操作，即使指标清楚地表明它应该这样做。

It might be that this statement is outdated, but unless it changed, this is hardcoded, and each scale up/down event should not scale more that 100% of existing pods.可能这个声明已经过时了，但除非它改变了，否则这是硬编码的，并且每个扩展/缩减事件不应扩展超过 100% 的现有 pod。

That said, you're not out of options either way, here are some approaches you can take:也就是说，您并非没有选择，这里有一些您可以采取的方法：

Pass your custom metric for scaling through a time average function - last time I did this was using prometheus and promql might be different than what you are using, but if you share more configuration in your question, I'm sure I could help find the syntax.通过时间平均值 function 传递您的自定义指标以进行缩放- 上次我这样做是使用 prometheus 和 promql 可能与您使用的不同，但如果您在问题中共享更多配置，我相信我可以帮助找到句法。
You can try using Keda - It has a cooldownPeriod object that you can place in the ScaledObject custom resource it comes with.您可以尝试使用Keda - 它有一个cooldownPeriod object，您可以将其放置在它附带的ScaledObject自定义资源中。

Kube.netes HPA - 扩大冷却时间

问题描述

2 个解决方案

解决方案1
4 2020-11-19 07:00:38

解决方案2
3 已采纳 2020-10-25 13:51:53

Kube.netes HPA - 扩大冷却时间

问题描述

2 个解决方案

解决方案1 4 2020-11-19 07:00:38

解决方案2 3 已采纳 2020-10-25 13:51:53

解决方案1
4 2020-11-19 07:00:38

解决方案2
3 已采纳 2020-10-25 13:51:53