用于 k8s 多集群的 Prometheus

Question

I have 3 kubernetes clusters (prod, test, monitoring).我有 3 个 Kubernetes 集群（产品、测试、监控）。 Iam new to prometheus so i have tested it by installing it in my test environment with the helm chart:我是 prometheus 的新手，所以我通过在我的测试环境中使用 helm 图表安装它来测试它：

# https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
helm install [RELEASE_NAME] prometheus-community/kube-prometheus-stack

But if i want to have metrics from the prod and test clusters, i have to repeat the same installation of the helm and each "kube-prometheus-stack" would be standalone in its own cluster.但是，如果我想从 prod 和 test 集群中获取指标，我必须重复相同的 helm 安装，并且每个“kube-prometheus-stack”都将在其自己的集群中独立。 It is not ideal at all.这根本不理想。 Iam trying to find a way to have a single prometheus/grafana which would federate/agregate the metrics from each cluster's prometheus server.我试图找到一种方法来拥有一个单一的 prometheus/grafana，它将联合/聚合来自每个集群的 prometheus 服务器的指标。

I found this link, saying about prometheus federation:我找到了这个链接，说的是普罗米修斯联盟：

https://prometheus.io/docs/prometheus/latest/federation/

If install the helm chart "kube-prometheus-stack" and get rid of grafana on the 2 other cluster, how can i make the 3rd "kube-prometheus-stack", on the 3rd cluster, scrapes metrics from the 2 other ones?如果安装掌舵图“kube-prometheus-stack”并在其他 2 个集群上摆脱 grafana，我怎样才能使第 3 个集群上的第 3 个“kube-prometheus-stack”从其他 2 个集群中刮取指标？
thanks谢谢

Answer 1

You have to modify configuration for prometheus federate so it can scrape metrics from other clusters as described in documentation :您必须修改 prometheus federate 的配置，以便它可以从其他集群中抓取指标，如文档中所述：

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s

    honor_labels: true
    metrics_path: '/federate'

    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'

    static_configs:
      - targets:
        - 'source-prometheus-1:9090'
        - 'source-prometheus-2:9090'
        - 'source-prometheus-3:9090'

params field checks for jobs to scrape metrics from. params字段检查作业以从中刮取指标。 In this particular example在这个特定的例子中

It will scrape any series with the label job="prometheus" or a metric name starting with job: from the Prometheus servers at source-prometheus-{1,2,3}:9090它将从位于 source-prometheus-{1,2,3}:9090 的 Prometheus 服务器中抓取带有标签 job="prometheus" 或以 job: 开头的度量名称的任何系列

You can check following articles to give you more insight of prometheus federation:您可以查看以下文章，让您更深入地了解普罗米修斯联盟：

Answer 2

You have few options here:你在这里有几个选择：

Option 1:选项1：

You can achieve this buy having vmagent or grafana-agent in prod and test clusters and configure remote write on them to your monitoring cluster.您可以在 prod 和 test 集群中使用vmagent或grafana-agent来实现此购买，并在它们上配置远程写入到您的监控集群。

But in this case you will need to install kube-state-metrics and node-exporter separately into prod and test cluster.但在这种情况下，您需要将 kube-state-metrics 和 node-exporter 分别安装到 prod 和 test 集群中。

Also it's important to add extra label for a cluster name (or any unique identifier) before sending metrics to remote write, to make sure that recording rules from "kube-prometheus-stack" are working correctly此外，在将指标发送到远程写入之前，为集群名称（或任何唯一标识符）添加额外标签也很重要，以确保来自“kube-prometheus-stack”的记录规则正常工作

diagram图表

Option 2:选项 2：

You can install victoria-metrics-k8s-stack chart.您可以安装victoria-metrics-k8s-stack图表。 It has similar functionality as kube-prometheus-stack - also installs bunch of components recording rules and dashboards.它具有与 kube-prometheus-stack 类似的功能——还安装了许多记录规则和仪表板的组件。

With this case you install victoria-metrics-k8s-stack in every cluster, but with different values.在这种情况下，您在每个集群中安装victoria-metrics-k8s-stack ，但具有不同的值。 For monitoring cluster you can use default values, with对于监控集群，您可以使用默认值，

grafana:
  sidecar:
    dashboards:
      multicluster: true

and proper configured ingress for vmsingle并为 vmsingle 正确配置入口

For prod and test cluster you need to disable bunch of components对于生产和测试集群，您需要禁用一堆组件

defaultRules:
  create: false

vmsingle:
  enabled: false
alertmanager:
  enabled: false
vmalert:
  enabled: false
vmagent:
  spec:
    remoteWrite:
      - url: "<vmsingle-ingress>/api/v1/write"
    externalLabels:
      cluster: <cluster-name>

grafana:
  enabled: false
  defaultDashboardsEnabled: false

in this case chart will deploy vmagent, kube-state-metrics, node-exporter and scrape configurations for vmagent.在这种情况下，chart 将为 vmagent 部署 vmagent、kube-state-metrics、node-exporter 和 scrape 配置。

diagram图表

Answer 3

You could try looking at Wavefront.您可以尝试查看 Wavefront。 It's a commercial tool now but you can get a 30 day trial free - also, it understands promQL.它现在是一个商业工具，但您可以免费试用 30 天 - 而且，它还支持 promQL。 So essentially, you could use the same prometheus rules and config across all clusters, and then use wavefront to just connect to all of those prom instances.所以本质上，您可以在所有集群中使用相同的 prometheus 规则和配置，然后使用 wavefront 连接到所有这些 prom 实例。

Another option may be Thanos, but I've never used it personally.另一个选择可能是灭霸，但我从未亲自使用过。

用于 k8s 多集群的 Prometheus

问题描述

3 个解决方案

解决方案1
5 已采纳 2020-11-20 15:43:36

解决方案2
2 2022-05-11 08:27:34

Option 1:选项1：

Option 2:选项 2：

解决方案3
1 2020-11-19 21:56:20

用于 k8s 多集群的 Prometheus

问题描述

3 个解决方案

解决方案1 5 已采纳 2020-11-20 15:43:36

解决方案2 2 2022-05-11 08:27:34

Option 1:选项1：

Option 2:选项 2：

解决方案3 1 2020-11-19 21:56:20

解决方案1
5 已采纳 2020-11-20 15:43:36

解决方案2
2 2022-05-11 08:27:34

解决方案3
1 2020-11-19 21:56:20