联邦的 Prometheus 警报管理器

Question

We have several clusters where our applications are running.我们有几个集群，我们的应用程序正在运行。 We would like to set up a Central Monitoring cluster which can scrape metrics from rest of cluster using Prometheus Federation.我们想建立一个中央监控集群，它可以使用 Prometheus Federation 从集群的其余部分抓取指标。

So to do that, I need to install prometheus server in each of cluster and install prometheus server via federation in central cluster.I will install Grafana as well in central cluster to visualise the metrics that we gather from rest of prometheus server.为此，我需要在每个集群中安装 prometheus 服务器，并通过中央集群中的联合安装 prometheus 服务器。我将在中央集群中安装 Grafana，以可视化我们从 prometheus 服务器的其余部分收集的指标。

So the question is;所以问题是；

Where should I setup the Alert Manager?我应该在哪里设置警报管理器？ Only for Central Cluster or each cluster has to be also alert manager?仅用于中央集群还是每个集群还必须是警报管理器？
What is the best practice alerting while using Federation?使用联合时发出警报的最佳做法是什么？
I though ı can use ingress controller to expose each prometheus server?我虽然可以使用入口控制器来公开每个普罗米修斯服务器？ What is the best practice to provide communication between prometheus server and federation in k8s?在 k8s 中提供 prometheus 服务器和联邦之间的通信的最佳实践是什么？

Answer 1

Based on this blog基于此博客

Where should I setup the Alert Manager?我应该在哪里设置警报管理器？ Only for Central Cluster or each cluster has to be also alert manager?仅用于中央集群还是每个集群还必须是警报管理器？

What is the best practice alerting while using Federation?使用联合时发出警报的最佳做法是什么？

The answer here would be to do that on each cluster.这里的答案是在每个集群上都这样做。

If the data you need to do alerting is moved from one Prometheus to another then you've added an additional point of failure.如果您需要发出警报的数据从一个 Prometheus 移动到另一个 Prometheus，那么您就增加了一个额外的故障点。 This is particularly risky when WAN links such as the internet are involved.当涉及互联网等 WAN 链接时，这尤其危险。 As far as is possible, you should try and push alerting as deep down the federation hierarchy as possible.尽可能地，您应该尝试在联邦层次结构中尽可能深入地推送警报。 For example an alert about a target being down should be setup on the Prometheus scraping that target, not a global Prometheus which could be several steps removed.例如，应该在抓取该目标的 Prometheus 上设置有关目标关闭的警报，而不是可以删除几个步骤的全局 Prometheus。

I though ı can use ingress controller to expose each prometheus server?我虽然可以使用入口控制器来公开每个普罗米修斯服务器？ What is the best practice to provide communication between prometheus server and federation in k8s?在 k8s 中提供 prometheus 服务器和联邦之间的通信的最佳实践是什么？

I think that depends on use case, in each doc I checked they just use targets in scrape_configs.static_configs in the prometheus.yml我认为这取决于用例，在我检查的每个文档中，他们只使用 prometheus.yml 中的scrape_configs.static_configs中的目标

like here喜欢这里

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s

    honor_labels: true
    metrics_path: '/federate'

    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'

    static_configs:
      - targets:
        - 'source-prometheus-1:9090'
        - 'source-prometheus-2:9090'
        - 'source-prometheus-3:9090'

OR或者

like here喜欢这里

prometheus.yml:
    rule_files:
      - /etc/config/rules
      - /etc/config/alerts

    scrape_configs:
      - job_name: 'federate'
        scrape_interval: 15s

        honor_labels: true
        metrics_path: '/federate'

        params:
          'match[]':
            - '{job="prometheus"}'
            - '{__name__=~"job:.*"}'

        static_configs:
          - targets:
            - 'prometheus-server:80'

Additionally, worth to check how they did this in this tutorial , where they used helm to build central monitoring cluster with two prometheus servers on two clusters.此外，值得在本教程中查看他们是如何做到这一点的，他们使用helm在两个集群上构建了两个 prometheus 服务器的中央监控集群。

联邦的 Prometheus 警报管理器

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-03-30 09:22:25

联邦的 Prometheus 警报管理器

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-03-30 09:22:25

解决方案1
1 已采纳 2020-03-30 09:22:25