How to monitor Flink Backpressure in Grafana with Prometheus metrics

Question

Flink Web UI has a brilliant backpressure section. But I can not see any metrics, given by Prometheus reporter, which could be used to detect backpressure in the same way for a Grafana dashboard.

Is there some way to get the same metrics outside of the Flink Web UI? Using the metrics described here https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html . Or even having a prometheus scraper for scraping the web api?

Answer 1

The back pressure monitoring that appears in the Flink dashboard isn't using the metrics system, so those values aren't available via a MetricsReporter. But you can access this info via the REST api at

/jobs/:jobid/vertices/:vertexid/backpressure

While this back pressure detection mechanism is useful, it does have its limitations. It works by calling Thread.getStackTrace(), which is expensive, and some operators (such as AsyncFunction) do critical activities in threads that aren't being sampled.

Another way to investigate back pressure is to set this configuration option in flink-conf.yaml

taskmanager.network.detailed-metrics: true

and then you can look at the metrics measuring inbound/outbound network queue lengths.

How to monitor Flink Backpressure in Grafana with Prometheus metrics

Question

1 answers

solution1
4 ACCPTED 2019-03-13 17:42:56

How to monitor Flink Backpressure in Grafana with Prometheus metrics

Question

1 answers

solution1 4 ACCPTED 2019-03-13 17:42:56

solution1
4 ACCPTED 2019-03-13 17:42:56