简体繁体 English

如何在 kubernetes 环境中扩展 prometheus

[英]How to scale prometheus in kubernetes environment

原文 2016-12-22 07:30:11 3 3 kubernetes/ metrics/ prometheus

I have reached moment when I need to split my prometheus into smaller ones.我已经到了需要将我的普罗米修斯分成更小的时候的时刻。 I have been reading about it here but it does not say anything about scaling in kubernetes.我一直在这里阅读它，但它没有说明在 kubernetes 中的扩展。 Below is my setup:下面是我的设置：

one node of prometheus普罗米修斯的一个节点
one node of kube state metrics kube 状态指标的一个节点
node exporter on each cluster node每个集群节点上的节点导出器

and there are about 50 namespaces which produces thousands of metrics and one current setup with one prometheus is not enough.大约有 50 个命名空间可以产生数千个指标，而一个当前的设置与一个普罗米修斯是不够的。 So I decided to split it to three instances like:所以我决定把它分成三个实例，比如：

one for kube state metrics metrics一种用于kube 状态指标的指标
one for node exporter metrics一种用于节点导出器指标
one for kubernetes metrics一种用于 kubernetes 指标

But after while i realised that those metrics are scraped by kubernetes_sd_config and there is no way to tell which metrics I want to scrape by which instance of prometheus or I am wrong.但是后来我意识到这些指标是由kubernetes_sd_config 抓取的，并且无法判断我想通过哪个普罗米修斯实例抓取哪些指标，或者我错了。 One solution would be to split kubernetes cluster into smaller one but it is too much work for now.一种解决方案是将 kubernetes 集群拆分为更小的集群，但目前工作量太大。

So my question is if there is any possibility to tell prometheus that I want scrape only kube state metrics , node exporter or native kubernetes metrics ?所以我的问题是，是否有可能告诉普罗米修斯我只想抓取kube 状态指标、节点导出器或本地 kubernetes 指标？

3 个解决方案

Another option would be going for a horizontally scalable, distributed Prometheus implementation: https://github.com/weaveworks/cortex (NB I wrote this.)另一种选择是采用水平可扩展的分布式 Prometheus 实现： https : //github.com/weaveworks/cortex （注意我写的。）

Its not ready for prime time yet, but we're using it internally and getting pretty good results.它还没有准备好迎接黄金时段，但我们正在内部使用它并取得了相当不错的结果。 It will be more effort to setup and operate than upstream Prometheus, but it should scale virtually indefinitely - and what's more we run it on Kubernetes, so it's really at home there.与上游 Prometheus 相比，它的设置和操作需要更多的努力，但它几乎可以无限扩展——而且我们在 Kubernetes 上运行它，所以它真的很自在。

Let me know if you're interested and I can walk you though setting it up.如果您有兴趣，请告诉我，我可以带您进行设置。

Scaling in Kubernetes is the same as elsewhere. Kubernetes 中的扩展与其他地方相同。 This is a question of using service discovery and relabelling to pick out what is monitored.这是一个使用服务发现和重新标记来挑选被监控的问题。

For example the configuration for the node exporters should already be a separate scrape_config so splitting it out to a separate Prometheus should be straightforward by splitting the configuration file.例如，节点导出器的配置应该已经是一个单独的 scrape_config，因此通过拆分配置文件将其拆分为单独的 Prometheus 应该很简单。

I had a similar task for federation.我有一个类似的联邦任务。 Here is how I did, following @brian-brazil answer:以下是我的做法，遵循@brian-brazil 的回答：

setup a master prometheus with config:使用配置设置master普罗米修斯：

scrape_configs: - job_name: dc_prometheus honor_labels: true metrics_path: /federate params: match[]: - '{job="my-kubernetes-job"}' static_configs: - targets: - prometheus-slaveA:9090 - prometheus-slaveB:9090 See how slaves are declared here. scrape_configs: - job_name: dc_prometheus honor_labels: true metrics_path: /federate params: match[]: - '{job="my-kubernetes-job"}' static_configs: - targets: - prometheus-slaveA:9090 - prometheus-slaveB:9090看看这里是如何声明奴隶的。 It's quite static for sure.它肯定是静态的。 Also, the match[] param here tells to grab all slave metrics.此外，这里的match[]参数告诉获取所有从属指标。 You would have to be smarter than that of course.当然，你必须比那更聪明。

setup slaves with that particular relabeling:使用特定的重新标记设置slaves ：

relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_slave] action: keep regex: slaveA

and similar for slaveB and so on.和 slaveB 类似，等等。

Now for each pod, instead of having the well-known annotation prometheus.io/scrape: true|false , you would have prometheus.io/slave: slaveA|slaveB .现在，对于每个 pod，您将拥有prometheus.io/slave: slaveA|slaveB ，而不是众所周知的注释prometheus.io/scrape: true|false 。

I described it with more details here: http://devlog.qaraywa.net/?p=176我在这里详细描述了它： http : //devlog.qaraywa.net/?p=176