简体   繁体   中英

How to monitor Flink Backpressure in Grafana with Prometheus metrics

Flink Web UI has a brilliant backpressure section. But I can not see any metrics, given by Prometheus reporter, which could be used to detect backpressure in the same way for a Grafana dashboard.

在此处输入图片说明 Is there some way to get the same metrics outside of the Flink Web UI? Using the metrics described here https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html . Or even having a prometheus scraper for scraping the web api?

The back pressure monitoring that appears in the Flink dashboard isn't using the metrics system, so those values aren't available via a MetricsReporter. But you can access this info via the REST api at

/jobs/:jobid/vertices/:vertexid/backpressure

While this back pressure detection mechanism is useful, it does have its limitations. It works by calling Thread.getStackTrace(), which is expensive, and some operators (such as AsyncFunction) do critical activities in threads that aren't being sampled.

Another way to investigate back pressure is to set this configuration option in flink-conf.yaml

taskmanager.network.detailed-metrics: true

and then you can look at the metrics measuring inbound/outbound network queue lengths.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM