繁体   English   中英

水平 pod 自动缩放不起作用:`无法获取资源 cpu 的指标:没有从 heapster 返回指标`

[英]Horizontal pod autoscaling not working: `unable to get metrics for resource cpu: no metrics returned from heapster`

在使用 kubeadm 安装 Kube.netes 后,我试图创建一个水平 pod 自动缩放。

主要症状是kubectl get hpaTARGETS列中的 CPU 指标返回为“未定义”:

$ kubectl get hpa
NAME        REFERENCE              TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
fibonacci   Deployment/fibonacci   <unknown> / 50%   1         3         1          1h

在进一步调查中,似乎hpa正在尝试从 Heapster 接收 CPU 指标——但在我的配置中,cpu 指标是由 cAdvisor 提供的。

我基于kubectl describe hpa fibonacci的 output 做出这个假设:

Name:                           fibonacci
Namespace:                      default
Labels:                         <none>
Annotations:                        <none>
CreationTimestamp:                  Sun, 14 May 2017 18:08:53 +0000
Reference:                      Deployment/fibonacci
Metrics:                        ( current / target )
  resource cpu on pods  (as a percentage of request):   <unknown> / 50%
Min replicas:                       1
Max replicas:                       3
Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason              Message
  --------- --------    -----   ----                -------------   --------    ------              -------
  1h        3s      148 horizontal-pod-autoscaler           Warning     FailedGetResourceMetric     unable to get metrics for resource cpu: no metrics returned from heapster
  1h        3s      148 horizontal-pod-autoscaler           Warning     FailedComputeMetricsReplicas    failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from heapster

为什么hpa尝试从 heapster 而不是 cAdvisor 接收这个指标?

我怎样才能解决这个问题?

请在下面找到我的部署,以及/var/log/container/kube-controller-manager.log的内容和kubectl get pods --namespace=kube-systemkubectl describe pods的 output

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: fibonacci
  labels:
    app: fibonacci
spec:
  template:
    metadata:
      labels:
        app: fibonacci
    spec:
      containers:
      - name: fibonacci
        image: oghma/fibonacci
        ports:
          - containerPort: 8088
        resources:
          requests:
            memory: "64Mi"
            cpu: "75m"
          limits:
            memory: "128Mi"
            cpu: "100m"

---
kind: Service
apiVersion: v1
metadata:
  name: fibonacci
spec:
  selector:
    app: fibonacci
  ports:
    - protocol: TCP
      port: 8088
      targetPort: 8088
  externalIPs: 
    - 192.168.66.103

---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: fibonacci
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: fibonacci
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 50

$ kubectl describe pods
Name:       fibonacci-1503002127-3k755
Namespace:  default
Node:       kubernetesnode1/192.168.66.101
Start Time: Sun, 14 May 2017 17:47:08 +0000
Labels:     app=fibonacci
        pod-template-hash=1503002127
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"fibonacci-1503002127","uid":"59ea64bb-38cd-11e7-b345-fa163edb1ca...
Status:     Running
IP:     192.168.202.1
Controllers:    ReplicaSet/fibonacci-1503002127
Containers:
  fibonacci:
    Container ID:   docker://315375c6a978fd689f4ba61919c15f15035deb9139982844cefcd46092fbec14
    Image:      oghma/fibonacci
    Image ID:       docker://sha256:26f9b6b2c0073c766b472ec476fbcd2599969b6e5e7f564c3c0a03f8355ba9f6
    Port:       8088/TCP
    State:      Running
      Started:      Sun, 14 May 2017 17:47:16 +0000
    Ready:      True
    Restart Count:  0
    Limits:
      cpu:  100m
      memory:   128Mi
    Requests:
      cpu:      75m
      memory:       64Mi
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-45kp8 (ro)
Conditions:
  Type      Status
  Initialized   True 
  Ready     True 
  PodScheduled  True 
Volumes:
  default-token-45kp8:
    Type:   Secret (a volume populated by a Secret)
    SecretName: default-token-45kp8
    Optional:   false
QoS Class:  Burstable
Node-Selectors: <none>
Tolerations:    node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
        node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:     <none>

$ kubectl get pods --namespace=kube-system

NAME                                        READY     STATUS    RESTARTS   AGE
calico-etcd-k1g53                           1/1       Running   0          2h
calico-node-6n4gp                           2/2       Running   1          2h
calico-node-nhmz7                           2/2       Running   0          2h
calico-policy-controller-1324707180-65m78   1/1       Running   0          2h
etcd-kubernetesmaster                       1/1       Running   0          2h
heapster-1428305041-zjzd1                   1/1       Running   0          1h
kube-apiserver-kubernetesmaster             1/1       Running   0          2h
kube-controller-manager-kubernetesmaster    1/1       Running   0          2h
kube-dns-3913472980-gbg5h                   3/3       Running   0          2h
kube-proxy-1dt3c                            1/1       Running   0          2h
kube-proxy-tfhr9                            1/1       Running   0          2h
kube-scheduler-kubernetesmaster             1/1       Running   0          2h
monitoring-grafana-3975459543-9q189         1/1       Running   0          1h
monitoring-influxdb-3480804314-7bvr3        1/1       Running   0          1h

$ cat /var/log/container/kube-controller-manager.log

"log":"I0514 17:47:08.631314       1 event.go:217] Event(v1.ObjectReference{Kind:\"Deployment\", Namespace:\"default\", Name:\"fibonacci\", UID:\"59e980d9-38cd-11e7-b345-fa163edb1ca6\", APIVersion:\"extensions\", ResourceVersion:\"1303\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up replica set fibonacci-1503002127 to 1\n","stream":"stderr","time":"2017-05-14T17:47:08.63177467Z"}
{"log":"I0514 17:47:08.650662       1 event.go:217] Event(v1.ObjectReference{Kind:\"ReplicaSet\", Namespace:\"default\", Name:\"fibonacci-1503002127\", UID:\"59ea64bb-38cd-11e7-b345-fa163edb1ca6\", APIVersion:\"extensions\", ResourceVersion:\"1304\", FieldPath:\"\"}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: fibonacci-1503002127-3k755\n","stream":"stderr","time":"2017-05-14T17:47:08.650826398Z"}
{"log":"E0514 17:49:00.873703       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:49:00.874034952Z"}
{"log":"E0514 17:49:30.884078       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:49:30.884546461Z"}
{"log":"E0514 17:50:00.896563       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:50:00.89688734Z"}
{"log":"E0514 17:50:30.906293       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:50:30.906825794Z"}
{"log":"E0514 17:51:00.915996       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:51:00.916348218Z"}
{"log":"E0514 17:51:30.926043       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:51:30.926367623Z"}
{"log":"E0514 17:52:00.936574       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:52:00.936903072Z"}
{"log":"E0514 17:52:30.944724       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:52:30.945120508Z"}
{"log":"E0514 17:53:00.954785       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:53:00.955126309Z"}
{"log":"E0514 17:53:30.970454       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:53:30.972996568Z"}
{"log":"E0514 17:54:00.980735       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:54:00.981098832Z"}
{"log":"E0514 17:54:30.993176       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:54:30.993538841Z"}
{"log":"E0514 17:55:01.002941       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:55:01.003265908Z"}
{"log":"W0514 17:55:06.511756       1 reflector.go:323] k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:192: watch of \u003cnil\u003e ended with: etcdserver: mvcc: required revision has been compacted\n","stream":"stderr","time":"2017-05-14T17:55:06.511957851Z"}
{"log":"E0514 17:55:31.013415       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:55:31.013776243Z"}
{"log":"E0514 17:56:01.024507       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:56:01.0248332Z"}
{"log":"E0514 17:56:31.036191       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:56:31.036606698Z"}
{"log":"E0514 17:57:01.049277       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:57:01.049616359Z"}
{"log":"E0514 17:57:31.064104       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:57:31.064489485Z"}
{"log":"E0514 17:58:01.073988       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:58:01.074339488Z"}
{"log":"E0514 17:58:31.084511       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:58:31.084839352Z"}
{"log":"E0514 17:59:01.096507       1 horizontal.go:201] failed to compute desired number of replicas based on listed metrics for Deployment/default/fibonacci: failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: the server could not find the requested resource (get services http:heapster:)\n","stream":"stderr","time":"2017-05-14T17:59:01.096896254Z"}

您可以从部署中删除 LIMITS 并尝试一下。 在我的部署中,我只使用了 REQUESTS for RESOURCES 并且它有效。 如果您看到 Horizo​​ntal Pod Autoscaler (HPA) 正在工作,那么稍后您也可以使用 LIMITS。 此讨论告诉您,仅使用 REQUESTS 就足以执行 HPA。

有一个选项可以在集群池上启用自动缩放,请确保先将其打开。

然后应用你的 hpa,不要忘记在 k8s 控制器上设置 CPU、内存请求和限制

需要注意的一件事是,如果您的 pod 上有多个容器,那么您应该为每个容器指定 CPU、内存请求和限制

如果在您的部署中您有多个容器,请确保您已在所有容器中指定资源限制。

我也在其他应用程序中看到了这一点:HPA API 中似乎有一个错误。

解决方案可以是使用复制控制器 scaleref 代替:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: fibonacci
  namespace: ....
spec:
  scaleRef:
    kind: ReplicationController
    name: fibonacci
    subresource: scale
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 50

未经测试,因此可能需要对scaleRef一些编辑(您使用了scaleTargetRef

如果您使用的是 GKE 1.9.x

有一些错误,需要先禁用自动缩放,然后再重新启用它。 这将提供当前值代替未知

尝试更新到可用的最新 GKE。

Tl;dr :如果您使用 AWS EKS 并指定.spec.templates.spec.containers.<resources|limits>不起作用,问题可能是您没有安装Kubernetes Metrics Server

我在使用 AWS EKS 时遇到了 Kubernetes HPA 的这个问题。 在寻找解决方案时,我遇到了下面的命令并决定运行它以查看是否安装了 Metrics Server:

kubectl get pods -n kube-system

我没有安装它。 事实证明,AWS 有此文档指出,默认情况下,度量服务器未安装在 EKS 集群上。 所以我按照文档建议的步骤安装服务器:

- Deploy the Metrics Server with the following command:

    kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

- Verify that the metrics-server deployment is running the desired number of pods with the following command.

    kubectl get deployment metrics-server -n kube-system

Output

NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   1/1     1 

这就是我的解决方案。 一旦 Metric Server 在我的集群上,我就成功地创建了 HPA,这些 HPA 能够获取有关其目标 pod/资源的使用信息。

PS:您也可以再次运行kubectl get pods -n kube-system来确认安装。

我遇到了类似的问题,希望这会有所帮助:

  1. 确保 HPA 的 ApiVersion 是正确的,因为语法在版本之间略有变化
  2. 执行 kubectl autoscale deploy -n --cpu-percent= --min= --max= --dry-run -o yaml

现在,这将根据集群的 ApiVersion 为您提供 HPA 的确切语法。 根据 output 修改您的 helm hpa.yaml 文件,这应该可以解决问题。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM