无法将 Flink 指标公开给 Prometheus

Question

I'm trying to expose the built-in metrics of Flink to Prometheus, but somehow Prometheus doesn't recognize the targets - both the JMX as well as the PrometheusReporter .我试图将 Flink 的内置指标公开给 Prometheus，但不知何故 Prometheus 无法识别目标 - JMX和PrometheusReporter 。

The scraping defined in prometheus.yml looks like this: prometheus.yml定义的抓取如下所示：

scrape_configs:
  - job_name: node
    static_configs:
      - targets: ['localhost:9100']

  - job_name: 'kafka-server'
    static_configs:
      - targets: ['localhost:7071']

  - job_name: 'flink-jmx'
    static_configs:
      - targets: ['localhost:8789']

  - job_name: 'flink-prom'
    static_configs:
      - targets: ['localhost:9249']

And my flink-conf.yml has the following lines:我的flink-conf.yml有以下几行：

#metrics.reporters: jmx, prom
metrics.reporters: jmx, prometheus

#metrics.reporter.jmx.factory.class: org.apache.flink.metrics.jmx.JMXReporterFactory
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx.port: 8789

metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249

However, both Flink targets are down when running a WordCount但是，在运行WordCount时，两个 Flink 目标都关闭了

in IntelliJ在 IntelliJ 中
as jar: java -jar target/flink-word-count.jar --input src/main/resources/loremipsum.txt作为 jar: java -jar target/flink-word-count.jar --input src/main/resources/loremipsum.txt
as Flink job: flink run target/flink-word-count.jar --input src/main/resources/loremipsum.txt作为 Flink 作业： flink run target/flink-word-count.jar --input src/main/resources/loremipsum.txt

According to the Flink docs I don't need any additional dependencies for JMX and a copy of the provided flink-metrics-prometheus-1.10.0.jar in flink/lib/ for the Prometheus reporter.根据 Flink 文档，我不需要 JMX 的任何额外依赖项以及为 Prometheus 报告器提供的flink/lib/提供的flink-metrics-prometheus-1.10.0.jar的flink/lib/ 。

What am I doing wrong?我究竟做错了什么？ What is missing?有什么不见了？

Answer 1

That particular job is going to run to completion pretty quickly, I believe.我相信，那项特定的工作将很快完成。 Once you get the setup working there may be no interesting metrics because the job doesn't run long enough for anything to show up.一旦设置工作，可能没有有趣的指标，因为作业运行的时间不够长，无法显示任何内容。

When you run with a mini-cluster (as java -jar ... ), the flink-conf.yaml file isn't loaded (unless you've done something rather special in your job to get it loaded).当您使用迷你集群（如java -jar ... ）运行时，不会加载flink-conf.yaml文件（除非您在工作中做了一些特别的事情来加载它）。 Note also that this file is normally has a .yaml extension;另请注意，此文件通常具有.yaml扩展名； I'm not sure if it works if .yml is used instead.如果使用.yml代替，我不确定它是否.yml 。

You can check the jog manager and task manager logs to make sure that the reporters are being loaded.您可以检查点动管理器和任务管理器日志以确保正在加载报告器。

FWIW, the last time I did this I used this setup, so that I could scrape from multiple processes: FWIW，上次我这样做时我使用了这个设置，这样我就可以从多个进程中抓取：

# flink-conf.yaml

metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9250-9260

# prometheus.yml

scrape_configs:
  - job_name: 'flink'
    static_configs:
      - targets: ['localhost:9250', 'localhost:9251']

无法将 Flink 指标公开给 Prometheus

问题描述

1 个解决方案

解决方案1
1 2020-09-17 10:07:45

无法将 Flink 指标公开给 Prometheus

问题描述

1 个解决方案

解决方案1 1 2020-09-17 10:07:45

解决方案1
1 2020-09-17 10:07:45