简体   繁体   English

Kubernetes hpa 与外部公制。 我的外部指标没有返回正确的值

[英]Kubernetes hpa with external metric. My external metric is not returning correct value

I want to scale my worker pods using HPA based on the total number of outstanding messages across all AWS SQS queues.我想根据所有 AWS SQS 队列中未完成消息的总数使用 HPA 扩展我的工作 pod。 Since there is no such metric available, I created a custom metric using lambda function.由于没有这样的指标可用,我使用 lambda function 创建了一个自定义指标。 I am using k8s-cloudwatch-adapter.我正在使用 k8s-cloudwatch-adapter。 https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/ https://aws.amazon.com/blogs/compute/scaling-kubernetes-deployments-with-amazon-cloudwatch-metrics/

I've tested my lambda function.我已经测试了我的 lambda function。 It returns the correct value and the metric also gets pushed to cloudwatch.My cloudwatch adapter is able to register the external metric as well.它返回正确的值,并且指标也被推送到 cloudwatch。我的 cloudwatch 适配器也能够注册外部指标。 I verified it using the command:我使用以下命令对其进行了验证:

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq.

Just for some reason it returns null value rather than returning the correct value.只是由于某种原因,它返回 null 值,而不是返回正确的值。 There are no issues with cloudwatch-adapter permissions. cloudwatch 适配器权限没有问题。 HPA doesn't throw any error. HPA 不会抛出任何错误。 Just that it shows value as "0" when it should return "15" in my case.只是它在我的情况下应该返回“15”时显示为“0”。

I think it's because of some wrong queries that I'm providing in my external metric manifest.我认为这是因为我在外部指标清单中提供了一些错误的查询。 This is how all my files look like.(Not including cloudwatch adapter manifest files)这就是我所有文件的样子。(不包括 cloudwatch 适配器清单文件)

Lambda: Lambda:

import boto3
def lambda_handler(event, context):
    client = boto3.client('sqs')

    listOfQueues = client.list_queues(
        QueueNamePrefix='test'
    )

    listOfQueues = listOfQueues["QueueUrls"]
    #print(listOfQueues)

    numberOfQueues= len(listOfQueues)
    print("Total number of queues: %s" %(numberOfQueues))

    totalOutstandingMessages=0

    for i in range(0, numberOfQueues):
        messages = client.get_queue_attributes(
            QueueUrl=listOfQueues[i],
            AttributeNames=[
                'ApproximateNumberOfMessages',
            ]
        )
        messages= messages["Attributes"]["ApproximateNumberOfMessages"]
        totalOutstandingMessages=totalOutstandingMessages+int(messages)
    print("Total number of Outsanding Messages: %s" %(totalOutstandingMessages))

    cloudwatch = boto3.client('cloudwatch')

    response = cloudwatch.put_metric_data(
        Namespace='CustomSQSMetrics',
        MetricData=[
            {
                'MetricName': 'OutstandingMessagesTest',
                'Dimensions': [
                    {
                        'Name': 'TotalOutStandingMessages',
                        'Value': 'OutStandingMessages'
                    },
                ],
                'Values': [
                    totalOutstandingMessages,
                ],
            },
        ]
    )
    print(response)

External metric manifest:外部指标清单:

kind: ExternalMetric
metadata:
  name: outstanding-messages
spec:
  name: outstanding-messages
  resource:
    resource: "deployment"
  queries:
    - id: sqs_helloworld
      metricStat:
        metric:
          namespace: "CustomSQSMetrics"
          metricName: "OutstandingMessagesTest"
          dimensions:
            - name: TotalOutStandingMessages
              value: "OutStandingMessages"
        period: 300
        stat: Maximum
        unit: Count
      returnData: true

HPA:高压钠灯:

apiVersion: autoscaling/v2beta1
metadata:
  name: workers-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta1
    kind: Deployment
    name: workers
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metricName: outstanding-messages
      targetValue: 12

This got resolved.这得到了解决。 It was because metric data was getting pushed to cloudwatch only when I was deploying/testing my lambda manually.这是因为只有当我手动部署/测试我的 lambda 时,指标数据才会被推送到 cloudwatch。 Hence when the external metric was trying to get the value, in that particular moment, it was receiving a null value.因此,当外部指标试图获取该值时,在那个特定时刻,它正在接收 null 值。 I added cron job to my lambda so that it runs every minute.我在我的 lambda 中添加了 cron 作业,以便它每分钟运行一次。 Post which data is being pushed to cloudwatch every minute and is available to be picked up by external metric all the time.每分钟发布哪些数据被推送到 cloudwatch,并且随时可以通过外部指标获取。 After doing this external metric was able to get the data and Hpa was able to scale my pods完成此外部指标后,能够获取数据并且 Hpa 能够扩展我的 pod

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM