水平 pod 自动缩放不起作用:`无法获取资源 cpu 的指标:没有从 heapster 返回指标`

[英]Horizontal pod autoscaling not working: `unable to get metrics for resource cpu: no metrics returned from heapster`

I'm trying to create an horizontal pod autoscaling after installing Kube.netes with kubeadm.在使用 kubeadm 安装 Kube.netes 后,我试图创建一个水平 pod 自动缩放。

The main symptom is that kubectl get hpa returns the CPU metric in the column TARGETS as "undefined":主要症状是kubectl get hpaTARGETS列中的 CPU 指标返回为“未定义”:

$ kubectl get hpa
fibonacci   Deployment/fibonacci   <unknown> / 50%   1         3         1          1h

On further investigation, it appears that hpa is trying to receive the CPU metric from Heapster - but on my configuration the cpu metric is being provided by cAdvisor.在进一步调查中,似乎hpa正在尝试从 Heapster 接收 CPU 指标——但在我的配置中,cpu 指标是由 cAdvisor 提供的。

I am making this assumption based on the output of kubectl describe hpa fibonacci :我基于kubectl describe hpa fibonacci的 output 做出这个假设:

Name:                           fibonacci
Namespace:                      default
Labels:                         <none>
Annotations:                        <none>
CreationTimestamp:                  Sun, 14 May 2017 18:08:53 +0000
Reference:                      Deployment/fibonacci
Metrics:                        ( current / target )
  resource cpu on pods  (as a percentage of request):   <unknown> / 50%
Min replicas:                       1
Max replicas:                       3
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason              Message
  --------- --------    -----   ----                -------------   --------    ------              -------
  1h        3s      148 horizontal-pod-autoscaler           Warning     FailedGetResourceMetric     unable to get metrics for resource cpu: no metrics returned from heapster
  1h        3s      148 horizontal-pod-autoscaler           Warning     FailedComputeMetricsReplicas    failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from heapster

Why does hpa try to receive this metric from heapster instead of cAdvisor?为什么hpa尝试从 heapster 而不是 cAdvisor 接收这个指标?

How can I fix this?我怎样才能解决这个问题?

Please find below my deployment, along with the contents of /var/log/container/kube-controller-manager.log and the output of kubectl get pods --namespace=kube-system and kubectl describe pods请在下面找到我的部署,以及/var/log/container/kube-controller-manager.log的内容和kubectl get pods --namespace=kube-systemkubectl describe pods的 output

apiVersion: extensions/v1beta1
kind: Deployment
  name: fibonacci
    app: fibonacci
        app: fibonacci
      - name: fibonacci
        image: oghma/fibonacci
          - containerPort: 8088
            memory: "64Mi"
            cpu: "75m"
            memory: "128Mi"
            cpu: "100m"

kind: Service
apiVersion: v1
  name: fibonacci
    app: fibonacci
    - protocol: TCP
      port: 8088
      targetPort: 8088

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
  name: fibonacci
    apiVersion: apps/v1beta1
    kind: Deployment
    name: fibonacci
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 50

$ kubectl describe pods
Name:       fibonacci-1503002127-3k755
Namespace:  default
Node:       kubernetesnode1/
Start Time: Sun, 14 May 2017 17:47:08 +0000
Labels:     app=fibonacci
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"fibonacci-1503002127","uid":"59ea64bb-38cd-11e7-b345-fa163edb1ca...
Status:     Running
Controllers:    ReplicaSet/fibonacci-1503002127
    Container ID:   docker://315375c6a978fd689f4ba61919c15f15035deb9139982844cefcd46092fbec14
    Image:      oghma/fibonacci
    Image ID:       docker://sha256:26f9b6b2c0073c766b472ec476fbcd2599969b6e5e7f564c3c0a03f8355ba9f6
    Port:       8088/TCP
    State:      Running
      Started:      Sun, 14 May 2017 17:47:16 +0000
    Ready:      True
    Restart Count:  0
      cpu:  100m
      memory:   128Mi
      cpu:      75m
      memory:       64Mi
    Environment:    <none>
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-45kp8 (ro)
  Type      Status
  Initialized   True 
  Ready     True 
  PodScheduled  True 
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-45kp8
    Optional:   false
QoS Class:  Burstable
Node-Selectors: <none>
Tolerations:    node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
        node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:     <none>

$ kubectl get pods --namespace=kube-system

NAME                                        READY     STATUS    RESTARTS   AGE
calico-etcd-k1g53                           1/1       Running   0          2h
calico-node-6n4gp                           2/2       Running   1          2h
calico-node-nhmz7                           2/2       Running   0          2h
calico-policy-controller-1324707180-65m78   1/1       Running   0          2h
etcd-kubernetesmaster                       1/1       Running   0          2h
heapster-1428305041-zjzd1                   1/1       Running   0          1h
kube-apiserver-kubernetesmaster             1/1       Running   0          2h
kube-controller-manager-kubernetesmaster    1/1       Running   0          2h
kube-dns-3913472980-gbg5h                   3/3       Running   0          2h
kube-proxy-1dt3c                            1/1       Running   0          2h
kube-proxy-tfhr9                            1/1       Running   0          2h
kube-scheduler-kubernetesmaster             1/1       Running   0          2h
monitoring-grafana-3975459543-9q189         1/1       Running   0          1h
monitoring-influxdb-3480804314-7bvr3        1/1       Running   0          1h

$ cat /var/log/container/kube-controller-manager.log

"log":"I0514 17:47:08.631314       1 event.go:217] Event(v1.ObjectReference{Kind:\"Deployment\", Namespace:\"default\", Name:\"fibonacci\", UID:\"59e980d9-38cd-11e7-b345-fa163edb1ca6\", APIVersion:\"extensions\", ResourceVersion:\"1303\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up replica set fibonacci-1503002127 to 1\n","stream":"stderr","time":"2017-05-14T17:47:08.63177467Z"}
{"log":"I0514 17:47:08.650662       1 event.go:217] Event(v1.ObjectReference{Kind:\"ReplicaSet\", Namespace:\"default\", Name:\"fibonacci-1503002127\", UID:\"59ea64bb-38cd-11e7-b345-fa163edb1ca6\", APIVersion:\"extensions\", ResourceVersion:\"1304\", FieldPath:\"\"}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: fibonacci-1503002127-3k755\n","stream":"stderr","time":"2017-05-14T17:47:08.650826398Z"}
{"log":"E0514 17:49:00.873703       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:49:00.874034952Z"}
{"log":"E0514 17:49:30.884078       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:49:30.884546461Z"}
{"log":"E0514 17:50:00.896563       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:50:00.89688734Z"}
{"log":"E0514 17:50:30.906293       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:50:30.906825794Z"}
{"log":"E0514 17:51:00.915996       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:51:00.916348218Z"}
{"log":"E0514 17:51:30.926043       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:51:30.926367623Z"}
{"log":"E0514 17:52:00.936574       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:52:00.936903072Z"}
{"log":"E0514 17:52:30.944724       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:52:30.945120508Z"}
{"log":"E0514 17:53:00.954785       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:53:00.955126309Z"}
{"log":"E0514 17:53:30.970454       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:53:30.972996568Z"}
{"log":"E0514 17:54:00.980735       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:54:00.981098832Z"}
{"log":"E0514 17:54:30.993176       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:54:30.993538841Z"}
{"log":"E0514 17:55:01.002941       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:55:01.003265908Z"}
{"log":"W0514 17:55:06.511756       1 reflector.go:323] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:192: watch of \u003cnil\u003e ended with: etcdserver: mvcc: required revision has been compacted\n","stream":"stderr","time":"2017-05-14T17:55:06.511957851Z"}
{"log":"E0514 17:55:31.013415       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:55:31.013776243Z"}
{"log":"E0514 17:56:01.024507       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:56:01.0248332Z"}
{"log":"E0514 17:56:31.036191       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:56:31.036606698Z"}
{"log":"E0514 17:57:01.049277       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:57:01.049616359Z"}
{"log":"E0514 17:57:31.064104       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:57:31.064489485Z"}
{"log":"E0514 17:58:01.073988       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:58:01.074339488Z"}
{"log":"E0514 17:58:31.084511       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:58:31.084839352Z"}
{"log":"E0514 17:59:01.096507       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:59:01.096896254Z"}

You can remove the LIMITS from your deployments and try it.您可以从部署中删除 LIMITS 并尝试一下。 In my deployment, I used only REQUESTS for RESOURCES and it worked.在我的部署中,我只使用了 REQUESTS for RESOURCES 并且它有效。 If you see the Horizontal Pod Autoscaler (HPA) is working then later you can play with LIMITS as well.如果您看到 Horizo​​ntal Pod Autoscaler (HPA) 正在工作,那么稍后您也可以使用 LIMITS。 This discussion tells you that only using REQUESTS is sufficient to do the HPA.此讨论告诉您,仅使用 REQUESTS 就足以执行 HPA。

there is an option to enable autscaling on cluster pool make sure to turn it on first.有一个选项可以在集群池上启用自动缩放,请确保先将其打开。

and then apply your hpa, and don't forgot to set cpu, memory requests,limits on k8s controllers然后应用你的 hpa,不要忘记在 k8s 控制器上设置 CPU、内存请求和限制

one thing to note is if you have multiple containers on your pod, you should then to specify cpu, memory requests,limits for each container需要注意的一件事是,如果您的 pod 上有多个容器,那么您应该为每个容器指定 CPU、内存请求和限制


I have seen this also at other apps: There seems to be a bug in the HPA API.我也在其他应用程序中看到了这一点:HPA API 中似乎有一个错误。

Solution can be to use a replication controller scaleref instead:解决方案可以是使用复制控制器 scaleref 代替:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
  name: fibonacci
  namespace: ....
    kind: ReplicationController
    name: fibonacci
    subresource: scale
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 50

Untested, so might need some edit on scaleRef (You used scaleTargetRef )未经测试,因此可能需要对scaleRef一些编辑(您使用了scaleTargetRef

In case if you are using GKE 1.9.x如果您使用的是 GKE 1.9.x

There is some bug, One needs to disable auto-scaling first and then re-enable it.有一些错误,需要先禁用自动缩放,然后再重新启用它。 This will provide current value in place of unknown这将提供当前值代替未知

Try updating to latest GKE available.尝试更新到可用的最新 GKE。

Tl;dr : If you're using AWS EKS and specifying .spec.templates.spec.containers.<resources|limits> didn't work, the problem might be that you don't have Kubernetes Metrics Server installed. Tl;dr :如果您使用 AWS EKS 并指定.spec.templates.spec.containers.<resources|limits>不起作用,问题可能是您没有安装Kubernetes Metrics Server

I ran into this problem with Kubernetes HPAs while using AWS EKS.我在使用 AWS EKS 时遇到了 Kubernetes HPA 的这个问题。 While hunting for solutions, I ran into the command below and decided to run it to see if I had Metrics Server installed:在寻找解决方案时,我遇到了下面的命令并决定运行它以查看是否安装了 Metrics Server:

kubectl get pods -n kube-system kubectl get pods -n kube-system

I didn't have it installed.我没有安装它。 And it turns out that AWS has this doc that states that by default Metrics Server isn't installed on EKS clusters.事实证明,AWS 有此文档指出,默认情况下,度量服务器未安装在 EKS 集群上。 So I followed the steps the doc advised for installing the server:所以我按照文档建议的步骤安装服务器:

- Deploy the Metrics Server with the following command:

    kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

- Verify that the metrics-server deployment is running the desired number of pods with the following command.

    kubectl get deployment metrics-server -n kube-system


metrics-server   1/1     1 

That was the solution for me.这就是我的解决方案。 Once Metric Server was on my cluster, I succeeded in creating HPAs that were able to get usage info about their target pods/resources.一旦 Metric Server 在我的集群上,我就成功地创建了 HPA,这些 HPA 能够获取有关其目标 pod/资源的使用信息。

PS: You can run kubectl get pods -n kube-system again too so as to confirm the installation. PS:您也可以再次运行kubectl get pods -n kube-system来确认安装。

I faced a similar issue, hope this helps:我遇到了类似的问题,希望这会有所帮助:

  1. make sure the ApiVersion of the HPA is correct as syntax changes slightly version to version确保 HPA 的 ApiVersion 是正确的,因为语法在版本之间略有变化
  2. Do kubectl autoscale deploy -n --cpu-percent= --min= --max= --dry-run -o yaml执行 kubectl autoscale deploy -n --cpu-percent= --min= --max= --dry-run -o yaml

Now this will give you the exact syntax for the HPA in accordance with the ApiVersion of the cluster.现在,这将根据集群的 ApiVersion 为您提供 HPA 的确切语法。 Amend your helm hpa.yaml file as per the output and that should do the trick.根据 output 修改您的 helm hpa.yaml 文件,这应该可以解决问题。

