Kube.netes HPA 無法擴展

Question

今天這很奇怪，我使用了 AWS EKS 集群，它在昨天和今天早上對我的 HPA 運行良好。 從下午開始，沒有任何變化，我的 HPA 突然不起作用了！！

這是我的 HPA：

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my_hpa_name
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my_deployment_name
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metric:
          name: my_metrics # MUST match the metrics on custom_metrics API
        target:
          type: AverageValue
          averageValue: 5
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30 # window to consider waiting while scaling Up. default is 0s if empty.
    scaleDown:
      stabilizationWindowSeconds: 300 # window to consider waiting while scaling down. default is 300s if empty.

而且，當我開始測試時，我做了很多嘗試，但都失敗了：

NAME                        REFERENCE                                   TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
xxxx-hpa   Deployment/xxxx-deployment   <unknown>/5   1         10        0          5s
xxxx-hpa   Deployment/xxxx-deployment   0/5           1         10        1          16s
xxxx-hpa   Deployment/xxxx-deployment   10/5          1         10        1          3m4s
xxxx-hpa   Deployment/xxxx-deployment   9/5           1         10        1          7m38s
xxxx-hpa   Deployment/xxxx-deployment   10/5          1         10        1          8m9s

你可以看到上面的副本永遠不會增加！

當我描述我的 HPA 時，它說沒有關於擴大規模的事件，但當前值 > 我的目標，但從未擴大規模！！！

Name:                         hpa_name
Namespace:                    default
Labels:                       <none>
Annotations:                  kubectl.kubernetes.io/last-applied-configuration:
                                {"apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"hpa_name","name...
CreationTimestamp:            Thu, 04 Mar 2021 20:28:40 -0800
Reference:                    Deployment/my_deployment
Metrics:                      ( current / target )
  "plex_queue_size" on pods:  10 / 5
Min replicas:                 1
Max replicas:                 10
Deployment pods:              1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric my_metrics
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>

這有什么問題嗎？

EKS集群可能有問題嗎？？？

編輯：

查了官方文檔： https://kube.netes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details

within a globally-configurable tolerance, from the --horizontal-pod-autoscaler-tolerance flag, which defaults to 0.1我想即使我的指標是 6/5，它仍然會 go 擴大，因為它大於 1.0

我清楚地看到我的 HPA 之前有效，這是它在 2 天前有效的一些證據：

NAME           REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa   Deployment/my-deployment   0/5       1         10        1          26s
my-hpa   Deployment/my-deployment   0/5       1         10        1          46s
my-hpa   Deployment/my-deployment   8/5       1         10        1          6m21s
my-hpa   Deployment/my-deployment   8/5       1         10        2          6m36s
my-hpa   Deployment/my-deployment   8/5       1         10        2          6m52s
my-hpa   Deployment/my-deployment   8/5       1         10        4          7m7s
my-hpa   Deployment/my-deployment   7/5       1         10        4          7m38s
my-hpa   Deployment/my-deployment   6750m/5   1         10        6          7m55s

但是現在，它不起作用。 我已經嘗試為其他指標啟動新的 HPA，它有效。 就這一個。 奇怪的...

新編輯：由於 EKS 集群，這是可能的，正如我所看到的：

kubectl get nodes
NAME                                           STATUS                     ROLES    AGE   VERSION
ip-172-27-177-146.us-west-2.compute.internal   Ready                      <none>   14h   v1.18.9-eks-d1db3c
ip-172-27-183-31.us-west-2.compute.internal    Ready,SchedulingDisabled   <none>   15h   v1.18.9-eks-d1db3c

SchedulingDisabled 是否意味着集群不足以容納新的 pod？

Answer 1

想通了。 這是 EKS 集群問題。 我有最多 2 個按需節點和最多 2 個現場節點的資源限制。 需要增加集群節點。

Answer 2

我想到的一件事是您的指標服務器可能未正確運行。 沒有來自metrics-server的數據，Horizontal Pod Autoscaling 將無法工作。

Answer 3

今天很奇怪，我使用了 AWS EKS 集群，昨天和今天早上它對我的 HPA 都很好。 從下午開始，沒什么變化，我的HPA突然不行了！！

這是我的 HPA：

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my_hpa_name
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my_deployment_name
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metric:
          name: my_metrics # MUST match the metrics on custom_metrics API
        target:
          type: AverageValue
          averageValue: 5
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30 # window to consider waiting while scaling Up. default is 0s if empty.
    scaleDown:
      stabilizationWindowSeconds: 300 # window to consider waiting while scaling down. default is 300s if empty.

而且，當我開始測試時，我做了很多嘗試，但都失敗了：

NAME                        REFERENCE                                   TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
xxxx-hpa   Deployment/xxxx-deployment   <unknown>/5   1         10        0          5s
xxxx-hpa   Deployment/xxxx-deployment   0/5           1         10        1          16s
xxxx-hpa   Deployment/xxxx-deployment   10/5          1         10        1          3m4s
xxxx-hpa   Deployment/xxxx-deployment   9/5           1         10        1          7m38s
xxxx-hpa   Deployment/xxxx-deployment   10/5          1         10        1          8m9s

你可以看到上面的副本永遠不會增加！

當我描述我的 HPA 時，它說沒有關於擴大的事件，但當前值 > 我的目標，但從不擴大！

Name:                         hpa_name
Namespace:                    default
Labels:                       <none>
Annotations:                  kubectl.kubernetes.io/last-applied-configuration:
                                {"apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"hpa_name","name...
CreationTimestamp:            Thu, 04 Mar 2021 20:28:40 -0800
Reference:                    Deployment/my_deployment
Metrics:                      ( current / target )
  "plex_queue_size" on pods:  10 / 5
Min replicas:                 1
Max replicas:                 10
Deployment pods:              1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric my_metrics
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>

這有什么問題？

EKS集群可能有問題嗎？？？

編輯：

查看官方文檔： https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details

within a globally-configurable tolerance, from the --horizontal-pod-autoscaler-tolerance flag, which defaults to 0.1我認為即使我的指標是 6/5，它仍然會擴大 go 因為它大於 1.0

我之前清楚地看到了我的 HPA 工作，這是 2 天前工作的一些證據：

NAME           REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa   Deployment/my-deployment   0/5       1         10        1          26s
my-hpa   Deployment/my-deployment   0/5       1         10        1          46s
my-hpa   Deployment/my-deployment   8/5       1         10        1          6m21s
my-hpa   Deployment/my-deployment   8/5       1         10        2          6m36s
my-hpa   Deployment/my-deployment   8/5       1         10        2          6m52s
my-hpa   Deployment/my-deployment   8/5       1         10        4          7m7s
my-hpa   Deployment/my-deployment   7/5       1         10        4          7m38s
my-hpa   Deployment/my-deployment   6750m/5   1         10        6          7m55s

但現在，它不起作用。 我已經嘗試為其他指標啟動新的 HPA，它有效。 就這一個。 奇怪的...

新編輯：由於 EKS 集群，這是可能的，正如我所看到的：

kubectl get nodes
NAME                                           STATUS                     ROLES    AGE   VERSION
ip-172-27-177-146.us-west-2.compute.internal   Ready                      <none>   14h   v1.18.9-eks-d1db3c
ip-172-27-183-31.us-west-2.compute.internal    Ready,SchedulingDisabled   <none>   15h   v1.18.9-eks-d1db3c

SchedulingDisabled 是否意味着集群不足以容納新的 Pod？

Kube.netes HPA 無法擴展

問題描述

2 個解決方案

解決方案1
1 已采納 2021-03-09 17:17:38

解決方案2
0 2021-03-05 08:21:00

解決方案3
0 2021-03-05 11:40:55

Kube.netes HPA 無法擴展

問題描述

2 個解決方案

解決方案1 1 已采納 2021-03-09 17:17:38

解決方案2 0 2021-03-05 08:21:00

解決方案3 0 2021-03-05 11:40:55

解決方案1
1 已采納 2021-03-09 17:17:38

解決方案2
0 2021-03-05 08:21:00

解決方案3
0 2021-03-05 11:40:55