基于 Kubernetes 指标的 Google Cloud GKE 水平 pod 自动缩放

Question

我想在 HPA 上使用 pod 网络收到的字节数标准 kubernetes 指标。 使用以下 yaml 来完成此操作，但出现无法从自定义指标 API 获取指标之类的错误：未注册自定义指标 API (custom.metrics.k8s.io)

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: xxxx-hoa
  namespace: xxxxx
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: xxxx-xxx
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Pods
    pods:
      metricName: received_bytes_count
      targetAverageValue: 20k

如果有人对相同类型的指标使用有经验，那将非常有帮助

Answer 1

autoscaling/v1 是一个 API，用于仅根据 CPU 利用率进行自动缩放。 因此，为了根据其他指标进行自动缩放，您应该使用 autoscaling/v2beta2。 我建议您阅读此文档以检查 API 版本。

Answer 2

解决方案

要使其工作，您需要部署Stackdriver Custom Metrics Adapter 。 下面的命令来部署它。

$ kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin --user "$(gcloud config get-value account)"

$ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

稍后您需要使用适当的Custom Metric ，在您的情况下它应该是kubernetes.io|pod|network|received_bytes_count

描述

在用于自动缩放工作负载的自定义和外部指标文档中，您提供了部署StackDriver Adapter所需的信息，然后才能获得自定义指标。

在您可以使用自定义指标之前，您必须在您的 GCP 项目中启用监控并在您的集群上安装 Stackdriver 适配器。

下一步是部署您的应用程序（我使用 Nginx 部署进行测试）并创建适当的 HPA。

在您的 HPA 示例中，您遇到了一些问题

apiVersion: autoscaling/v2beta1 ## you can also use autoscaling/v2beta2 if you need more features, however for this scenario is ok
kind: HorizontalPodAutoscaler
metadata:
  name: xxxx-hoa
  namespace: xxxxx # HPA have namespace specified, deployment doesnt have
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1 # apiVersion: apps/v1beta1 is quite old. In Kubernetes 1.16+ it was changed to apps/v1
    kind: Deployment
    name: xxxx-xxx
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Pods
    pods:
      metricName: received_bytes_count # this metrics should be replaced with kubernetes.io|pod|network|received_bytes_count
      targetAverageValue: 20k

在 GKE 中，您可以在autoscaling/v2beta1和autoscaling/v2beta2 autoscaling/v2beta1之间进行选择。 您的案例适用于两个apiVersions ，但是如果您决定使用autoscaling/v2beta2您将需要更改清单语法。

为什么是kubernetes.io/pod/network/received_bytes_count ？ 您指的是 Kubernetes 指标，此文档中提供了/pod/network/received_bytes_count 。

为什么| 而不是/ ? 如果您在 Github 上查看Stackdriver 文档，您会找到相关信息。

Stackdriver 指标具有以“/”字符分隔的路径形式，但自定义指标 API 禁止使用“/”字符。 直接通过自定义指标 API 或通过在 HPA 中指定自定义指标使用自定义指标 - Stackdriver Adapter 时，请将“/”字符替换为“|”。 例如，要使用 custom.googleapis.com/my/custom/metric，请指定 custom.googleapis.com|my|custom|metric。

正确配置

对于 v2beta1

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: xxxx-hoa
spec:
  scaleTargetRef:
    apiVersion: apps/v1 # In your case should be apps/v1beta1 but my deployment was created with apps/v1 apiVersion
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Pods
    pods:
      metricName: "kubernetes.io|pod|network|received_bytes_count"
      targetAverageValue: 20k

对于 v2beta2

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: xxxx-hoa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 6
  metrics:
  - type: Pods
    pods:
      metric:
        name: "kubernetes.io|pod|network|received_bytes_count"
      target:
        type: AverageValue
        averageValue: 20k

测试输出

Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    SucceededRescale  the HPA controller was able to update the target scale to 2
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from pods metric kubernetes.io|pod|network|received_bytes_count
  ScalingLimited  True    TooFewReplicas    the desired replica count is more than the maximum replica count
Events:
  Type    Reason             Age                 From                       Message
  ----    ------             ----                ----                       -------
  Normal  SuccessfulRescale  8m18s               horizontal-pod-autoscaler  New size: 4; reason: pods metric kubernetes.io|pod|network|received_bytes_count above target
  Normal  SuccessfulRescale  8m9s                horizontal-pod-autoscaler  New size: 6; reason: pods metric kubernetes.io|pod|network|received_bytes_count above target
  Normal  SuccessfulRescale  17s                 horizontal-pod-autoscaler  New size: 5; reason: All metrics below target
  Normal  SuccessfulRescale  9s (x2 over 8m55s)  horizontal-pod-autoscaler  New size: 2; reason: All metrics below target

您当前的配置可能存在的问题

在您的 HPA 中，您已指定命名空间，但未在目标部署中。 HPA 和部署都应该具有相同的命名空间。 使用这种不匹配的配置，您可能会遇到以下问题：

Conditions:
  Type         Status  Reason          Message
  ----         ------  ------          -------
  AbleToScale  False   FailedGetScale  the HPA controller was unable to get the target's current scale: deployments/scale.apps "nginx" not found
Events:
  Type     Reason          Age                  From                       Message
  ----     ------          ----                 ----                       -------
  Warning  FailedGetScale  94s (x264 over 76m)  horizontal-pod-autoscaler  deployments/scale.apps "nginx" not found

在 Kubernetes 1.16+ 中，部署使用apiVersion: apps/v1 ，您将无法在 Kubernets 1.16+ 中使用apiVersion: apps/v1 apiVersion: apps/v1beta1创建部署

基于 Kubernetes 指标的 Google Cloud GKE 水平 pod 自动缩放

问题描述

2 个解决方案

解决方案1
1 2020-11-07 23:10:56

解决方案2
1 已采纳 2020-11-18 14:10:30

基于 Kubernetes 指标的 Google Cloud GKE 水平 pod 自动缩放

问题描述

2 个解决方案

解决方案1 1 2020-11-07 23:10:56

解决方案2 1 已采纳 2020-11-18 14:10:30

解决方案1
1 2020-11-07 23:10:56

解决方案2
1 已采纳 2020-11-18 14:10:30