从普罗米修斯端点汇总指标

Question

I have a service running in a k8s cluster, which I want to monitor using Prometheus Operator.我有一个在 k8s 集群中运行的服务，我想使用 Prometheus Operator 对其进行监控。 The service has a /metrics endpoint, which returns simple data like:该服务有一个/metrics端点，它返回简单的数据，例如：

myapp_first_queue_length 12
myapp_first_queue_processing 2
myapp_first_queue_pending 10
myapp_second_queue_length 4
myapp_second_queue_processing 4
myapp_second_queue_pending 0

The API runs in multiple pods, behind a basic Service object: API 在多个 pod 中运行，在基本Service object 后面：

apiVersion: v1
kind: Service
metadata:
  name: myapp-api
  labels:
    app: myapp-api
spec:
  ports:
  - port: 80
    name: myapp-api
    targetPort: 80
  selector:
    app: myapp-api

I've installed Prometheus using kube-prometheus , and added a ServiceMonitor object:我已经使用kube-prometheus安装了 Prometheus，并添加了一个ServiceMonitor object：

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-api
  labels:
    app: myapp-api
spec:
  selector:
    matchLabels:
      app: myapp-api
  endpoints:
  - port: myapp-api
    path: /api/metrics
    interval: 10s

Prometheus discovers all the pods running instances of the API, and I can query those metrics from the Prometheus graph. Prometheus 发现所有运行 API 实例的 pod，我可以从 Prometheus 图表中查询这些指标。 So far so good.到目前为止，一切都很好。

The issue is, those metrics are aggregate - each API instance/pod doesn't have its own queue, so there's no reason to collect those values from every instance.问题是，这些指标是聚合的——每个 API 实例/pod 没有自己的队列，因此没有理由从每个实例收集这些值。 In fact it seems to invite confusion - if Prometheus collects the same value from 10 pods, it looks like the total value is 10x what it really is, unless you know to apply something like avg .事实上，这似乎会引起混淆——如果 Prometheus 从 10 个 pod 收集相同的值，看起来总值是实际值的 10 倍，除非你知道应用avg之类的东西。

Is there a way to either tell Prometheus "this value is already aggregate and should always be presented as such" or better yet, tell Prometheus to just scrape the values once via the internal load balancer for that service, rather than hitting each pod?有没有办法告诉普罗米修斯“这个值已经聚合并且应该始终如此呈现”或者更好的是，告诉普罗米修斯通过该服务的内部负载均衡器只抓取一次值，而不是击中每个吊舱？

edit编辑

The actual API is just a simple Deployment object:实际的 API 只是一个简单的Deployment object：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-api
  labels:
    app: myapp-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp-api
  template:
    metadata:
      labels:
        app: myapp-api
    spec:
      imagePullSecrets:
      - name: mysecret
      containers:
      - name: myapp-api
        image: myregistry/myapp:2.0
        ports:
        - containerPort: 80
        volumeMounts:
        - name: config
          mountPath: "app/config.yaml"
          subPath: config.yaml
      volumes:
      - name: config
        configMap:
          name: myapp-api-config

Answer 1

In your case to avoid metrics aggregation you can use, as already mentioned in your post, avg() operator to or PodMonitor instead of ServiceMonitor .在您的情况下，为了避免指标聚合，您可以使用avg()运算符或PodMonitor而不是ServiceMonitor ，如您的帖子中所述。

The PodMonitor custom resource definition (CRD) allows to declaratively define how a dynamic set of pods should be monitored. PodMonitor自定义资源定义 (CRD) 允许以声明方式定义应如何监视一组动态 pod。 Which pods are selected to be monitored with the desired configuration is defined using label selections.使用 label 选项定义选择要使用所需配置监视哪些 pod。

This way it will scrape the metrics from the specified pod only.这样，它将仅从指定的 pod 中抓取指标。

Answer 2

Prometheus Operator developers are kindly working (as of Jan 2023) on a generic ScrapeConfig CRD that is designed to solve exactly the use case you describe: https://github.com/prometheus-operator/prometheus-operator/issues/2787 Prometheus Operator 开发人员正在（截至 2023 年 1 月）开发通用 ScrapeConfig CRD，旨在准确解决您描述的用例： https://github.com/prometheus-operator/prometheus-operator/issues/2787

In the meantime, you can use the " additional scrape config " facility of Prometheus Operator to setup a custom scrape target.同时，您可以使用 Prometheus Operator 的“ 附加抓取配置”工具来设置自定义抓取目标。

The idea is that the configured scrape target will be hit only once per scrape period and the service DNS will load-balance the request to one of the N pods behind the service, thus avoiding duplicate metrics.这个想法是配置的抓取目标在每个抓取周期只会被命中一次，服务 DNS 会将请求负载平衡到服务后面的 N 个 pod 之一，从而避免重复指标。

In particular, you can override the kube-prometheus-stack Helm values as follows:特别是，您可以按如下方式覆盖kube-prometheus-stack Helm 值：

prometheus:
  prometheusSpec:
    additionalScrapeConfigs:
      - job_name: 'myapp-api-aggregates':
        metrics_path: '/api/metrics'
        scheme: 'http'
        static_configs:
          - targets: ['myapp-api:80']

从普罗米修斯端点汇总指标

问题描述

2 个解决方案

解决方案1
1 2020-04-30 15:24:57

解决方案2
0 2023-01-23 10:28:00

从普罗米修斯端点汇总指标

问题描述

2 个解决方案

解决方案1 1 2020-04-30 15:24:57

解决方案2 0 2023-01-23 10:28:00

解决方案1
1 2020-04-30 15:24:57

解决方案2
0 2023-01-23 10:28:00