[英]Can't expose Flink metrics to Prometheus
I'm trying to expose the built-in metrics of Flink to Prometheus, but somehow Prometheus doesn't recognize the targets - both the JMX as well as the PrometheusReporter .我试图将 Flink 的内置指标公开给 Prometheus,但不知何故 Prometheus 无法识别目标 - JMX和PrometheusReporter 。
The scraping defined in prometheus.yml
looks like this: prometheus.yml
定义的抓取如下所示:
scrape_configs:
- job_name: node
static_configs:
- targets: ['localhost:9100']
- job_name: 'kafka-server'
static_configs:
- targets: ['localhost:7071']
- job_name: 'flink-jmx'
static_configs:
- targets: ['localhost:8789']
- job_name: 'flink-prom'
static_configs:
- targets: ['localhost:9249']
And my flink-conf.yml
has the following lines:我的
flink-conf.yml
有以下几行:
#metrics.reporters: jmx, prom
metrics.reporters: jmx, prometheus
#metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx.port: 8789
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249
However, both Flink targets are down when running a WordCount但是,在运行WordCount时,两个 Flink 目标都关闭了
java -jar target/flink-word-count.jar --input src/main/resources/loremipsum.txt
java -jar target/flink-word-count.jar --input src/main/resources/loremipsum.txt
flink run target/flink-word-count.jar --input src/main/resources/loremipsum.txt
flink run target/flink-word-count.jar --input src/main/resources/loremipsum.txt
According to the Flink docs I don't need any additional dependencies for JMX and a copy of the provided flink-metrics-prometheus-1.10.0.jar
in flink/lib/
for the Prometheus reporter.根据 Flink 文档,我不需要 JMX 的任何额外依赖项以及为 Prometheus 报告器提供的
flink/lib/
提供的flink-metrics-prometheus-1.10.0.jar
的flink/lib/
。
What am I doing wrong?我究竟做错了什么? What is missing?
有什么不见了?
That particular job is going to run to completion pretty quickly, I believe.我相信,那项特定的工作将很快完成。 Once you get the setup working there may be no interesting metrics because the job doesn't run long enough for anything to show up.
一旦设置工作,可能没有有趣的指标,因为作业运行的时间不够长,无法显示任何内容。
When you run with a mini-cluster (as java -jar ...
), the flink-conf.yaml
file isn't loaded (unless you've done something rather special in your job to get it loaded).当您使用迷你集群(如
java -jar ...
)运行时,不会加载flink-conf.yaml
文件(除非您在工作中做了一些特别的事情来加载它)。 Note also that this file is normally has a .yaml
extension;另请注意,此文件通常具有
.yaml
扩展名; I'm not sure if it works if .yml
is used instead.如果使用
.yml
代替,我不确定它是否.yml
。
You can check the jog manager and task manager logs to make sure that the reporters are being loaded.您可以检查点动管理器和任务管理器日志以确保正在加载报告器。
FWIW, the last time I did this I used this setup, so that I could scrape from multiple processes: FWIW,上次我这样做时我使用了这个设置,这样我就可以从多个进程中抓取:
# flink-conf.yaml
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260
# prometheus.yml
scrape_configs:
- job_name: 'flink'
static_configs:
- targets: ['localhost:9250', 'localhost:9251']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.