简体   繁体   English

Kubernetes 自动缩放:HPA 不适用于 Java Netty API 的自定义指标

[英]Kubernetes autoscaling : HPA not working with custom metrics for Java Netty API

I am setting up HPA on custom metrics - basically on no.我正在根据自定义指标设置 HPA - 基本上没有。 of threads of a deployment.部署的线程数。

I have created a PrometheusRule to get average of threads (5 min. based).我创建了一个 PrometheusRule 来获取线程的平均值(基于 5 分钟)。 On the container, I am doing cont.在容器上,我在做cont。 load to increase the threads and average value is also going high.增加线程的负载,平均值也越来越高。

I started with 2 replicas and when current value is crossing the target value, am not seeing my deployment scaling out.我从 2 个副本开始,当当前值超过目标值时,我没有看到我的部署向外扩展。

As you can see, have set target as 44 and current value is 51.55 for more than 10 min but still no scale up.如您所见,已将目标设置为 44,当前值为 51.55 超过 10 分钟,但仍然没有放大。 在此处输入图片说明

Version Info版本信息

  • Kubernetes (AKS) : 1.19.11 Kubernetes(AKS):1.19.11
  • Prometheus : 2.22.1普罗米修斯:2.22.1
  • Setup done via prometheus-operator (0.7)通过 prometheus-operator (0.7) 完成设置
  • Autoscaling api version : autoscaling/v2beta2 Autoscaling api 版本:autoscaling/v2beta2

Prometheus Rule普罗米修斯规则

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: rdp-rest
  namespace: default   
  labels:
    app.kubernetes.io/name: node-exporter
    app.kubernetes.io/version: 1.0.1
    prometheus: k8s
    role: alert-rules
    run: rdp-rest
    app: rdp-rest
spec:
  groups:
  - name: hpa-rdp-rest
    interval: 10s
    rules:
    - expr: 'avg_over_time(container_threads{container="rdp-rest"}[5m])'
      record: hpa_custom_metrics_container_threads_rdp_rest
      labels:
        service: rdp-rest

Manifests - https://github.com/prometheus-operator/kube-prometheus/tree/release-0.7/manifests清单- https://github.com/prometheus-operator/kube-prometheus/tree/release-0.7/manifests

Update (6th July) - HPA with custom metrics is working fine for any other technology like nodejs/ngnix, etc. but not working for netty api更新(7 月 6 日) -具有自定义指标的 HPA 适用于任何其他技术,如 nodejs/ngnix 等,但不适用于 netty api

Any thoughts?有什么想法吗?

Finally after a week, found the root cause.终于在一周后,找到了根本原因。

So the issue was with the label.所以问题出在标签上。 I had 2 deployments with same label.我有 2 个具有相同标签的部署。 So what internal hpa is doing is it's getting stats for all the pods with that label and then doing scale up/down.因此,内部 hpa 正在做的是获取具有该标签的所有 pod 的统计信息,然后进行放大/缩小。 As soon as I corrected the labels, hpa worked as expected.一旦我更正了标签,hpa 就会按预期工作。

But the same on prometheus UI shows stats for ONLY one type of pods.但同样在 prometheus UI 上仅显示一种类型的 pod 的统计信息。 Looks like some internal bug or something.看起来像一些内部错误或什么的。 Not getting when we provide name why it's going and fetching stats based on label.当我们提供名称为什么它会根据标签获取统计信息时没有得到。

Point to remember : Always double check your labels.要记住的一点:始终仔细检查您的标签。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM