[英]Not able to see flink custom metrics to Prometheus
I have a flink job written in scala and I am creating one custom metric to count the nmber of events in a stream. The job is deployed on kube.netes and I see system metrics of job-manager and task-managers in the prometheus.我有一个用 scala 编写的 flink 作业,我正在创建一个自定义指标来计算 stream 中的事件数量。该作业部署在 kube.netes 上,我在 prometheus 中看到了作业管理器和任务管理器的系统指标。 However, we don't see the custom metrics in prometheus though we see that in Flink UI.然而,我们在 prometheus 中看不到自定义指标,尽管我们在 Flink UI 中看到了。 Below is the custom metrics code:以下是自定义指标代码:
val sampleProcessFunction = new ProcessFunction[String, String] {
@transient private var counter: Counter = _
override def open(parameters: Configuration): Unit =
counter = getRuntimeContext.getMetricGroup.addGroup("abc").counter("streamcounter")
override def processElement(
value: String,
ctx: ProcessFunction[String, String]#Context,
out: Collector[String]): Unit = {
val result = value.parseJson.toString
counter.inc()
out.collect(result)
}
}
flink-config.yaml has these entries related to prometheus: flink-config.yaml 有这些与普罗米修斯相关的条目:
taskmanager.network.detailed-metrics: true
metrics.reporter.prom.class:org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 8080
Not only custom metrics, any taskmanager metrics that follows the path taskmanager.job.不仅是自定义指标,任何遵循路径taskmanager.job 的任务管理器指标。 * are not exposed in the metrics endpoint. * 未在指标端点中公开。 When I am getting into a taskmanager pod and doing a curl to the metrics endpoint like this:当我进入 taskmanager pod 并像这样对指标端点执行 curl 时:
kubectl exec -it flink-taskmanager-app-7448cdb787-9c48j -- /bin/bash
curl http://localhost:8080/metrics
I am only getting the status metrics related to taskmanager:我只得到与 taskmanager 相关的状态指标:
# HELP flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed MemoryUsed (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed gauge
flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Flink_Memory_Managed_Used Used (scope: taskmanager_Status_Flink_Memory_Managed)
# TYPE flink_taskmanager_Status_Flink_Memory_Managed_Used gauge
flink_taskmanager_Status_Flink_Memory_Managed_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments UsedMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Network_TotalMemorySegments TotalMemorySegments (scope: taskmanager_Status_Network)
# TYPE flink_taskmanager_Status_Network_TotalMemorySegments gauge
flink_taskmanager_Status_Network_TotalMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_Shuffle_Netty_AvailableMemory AvailableMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_AvailableMemory gauge
flink_taskmanager_Status_Shuffle_Netty_AvailableMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.84252416E8
# HELP flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded ClassesLoaded (scope: taskmanager_Status_JVM_ClassLoader)
# TYPE flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded gauge
flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 11075.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Max Max (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Max gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 2.68435456E8
# HELP flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage RequestedMemoryUsage (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage gauge
flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments AvailableMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Used Used (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Used gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 6.5252976E7
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Max Max (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Max gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 7.80140544E8
# HELP flink_taskmanager_Status_JVM_Memory_Direct_Count Count (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_Count gauge
flink_taskmanager_Status_JVM_Memory_Direct_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30065.0
# HELP flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.85225216E8
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time Time (scope: taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Threads_Count Count (scope: taskmanager_Status_JVM_Threads)
# TYPE flink_taskmanager_Status_JVM_Threads_Count gauge
flink_taskmanager_Status_JVM_Threads_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 51.0
# HELP flink_taskmanager_Status_Shuffle_Netty_TotalMemory TotalMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_TotalMemory gauge
flink_taskmanager_Status_Shuffle_Netty_TotalMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.84252416E8
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time Time (scope: taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 55.0
# HELP flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded ClassesUnloaded (scope: taskmanager_Status_JVM_ClassLoader)
# TYPE flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded gauge
flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Used Used (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Used gauge
flink_taskmanager_Status_JVM_Memory_Heap_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 1.56297264E8
# HELP flink_taskmanager_Status_JVM_CPU_Time Time (scope: taskmanager_Status_JVM_CPU)
# TYPE flink_taskmanager_Status_JVM_CPU_Time gauge
flink_taskmanager_Status_JVM_CPU_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.001E10
# HELP flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed MemoryUsed (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed gauge
flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.85225217E8
# HELP flink_taskmanager_Status_Shuffle_Netty_UsedMemory UsedMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_UsedMemory gauge
flink_taskmanager_Status_Shuffle_Netty_UsedMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count Count (scope: taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 3.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Committed Committed (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Committed gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 6.7375104E7
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Max Max (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Max gauge
flink_taskmanager_Status_JVM_Memory_Heap_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.429185024E9
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Committed Committed (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Committed gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.8787328E7
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Used Used (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Used gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.4597576E7
# HELP flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments TotalMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_Flink_Memory_Managed_Total Total (scope: taskmanager_Status_Flink_Memory_Managed)
# TYPE flink_taskmanager_Status_Flink_Memory_Managed_Total gauge
flink_taskmanager_Status_Flink_Memory_Managed_Total{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.294967296E9
# HELP flink_taskmanager_Status_JVM_CPU_Load Load (scope: taskmanager_Status_JVM_CPU)
# TYPE flink_taskmanager_Status_JVM_CPU_Load gauge
flink_taskmanager_Status_JVM_CPU_Load{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.002796347271376764
# HELP flink_taskmanager_Status_JVM_Memory_Mapped_Count Count (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_Count gauge
flink_taskmanager_Status_JVM_Memory_Mapped_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Committed Committed (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Committed gauge
flink_taskmanager_Status_JVM_Memory_Heap_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.429185024E9
# HELP flink_taskmanager_Status_Network_AvailableMemorySegments AvailableMemorySegments (scope: taskmanager_Status_Network)
# TYPE flink_taskmanager_Status_Network_AvailableMemorySegments gauge
flink_taskmanager_Status_Network_AvailableMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity TotalCapacity (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity gauge
flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count Count (scope: taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
Note: No explicit filter/exclusion configured in the config file.注意:配置文件中没有配置明确的过滤器/排除。
Can anybody please help how can we get the taskmanager.job.任何人都可以帮助我们如何获得taskmanager.job。 * metrics including custom metrics? * 指标包括自定义指标?
Can you share more of the custom metric code you've used, so we can see it more in context?您能否分享更多您使用过的自定义指标代码,以便我们可以在上下文中看到更多? What you've shared so far isn't obviously correct.到目前为止,您分享的内容显然不正确。 Or the problem could be related to the metric name -- how are you looking for it in Prometheus?或者问题可能与指标名称有关——您如何在 Prometheus 中查找它?
You'll find a working example of a custom metric in https://docs.immerok.cloud/docs/how-to-guides/development/measuring-latency/ .您将在https://docs.immerok.cloud/docs/how-to-guides/development/measuring-latency/中找到自定义指标的工作示例。
Note: I work for Immerok.注意:我为 Immerok 工作。
Update:更新:
It sounds like the problem is probably on the Prometheus side of things.听起来问题可能出在普罗米修斯方面。
A couple of things to check:要检查的几件事:
metrics.reporters: prom
, but didn't share it above.我假设您在配置中有一行显示metrics.reporters: prom
,但上面没有分享。https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html is a bit dated, but still relevant. https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html有点过时,但仍然相关。
Note: While this isn't the problem, configuring metrics reporters via their class, as with metrics.reporter.prom.class
, has been obsolete for a long time and was deprecated in Flink 1.16.注意:虽然这不是问题,但通过他们的 class 配置指标报告器,与metrics.reporter.prom.class
,已经过时了很长时间,并在 Flink 1.16 中被弃用。 This will be removed in 1.17.这将在 1.17 中删除。 This line can be updated to此行可以更新为
metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory
The issue was with the flink version.问题出在 flink 版本上。 I was using 1.15.0 which has a reported bug on metrics https://lists.apache.org/thread/6bd9vmcroh7576d7h1kdcd8czf0b4l73我使用的是 1.15.0,它在指标https://lists.apache.org/thread/6bd9vmcroh7576d7h1kdcd8czf0b4l73上报告了一个错误
Basically, when a job runs, taskmanager metrics related to taskmanager.job.基本上,当作业运行时,taskmanager 指标与taskmanager.job 相关。 * disappears. * 消失。 After upgrading flink to 1.15.2, it started working properly. flink 升级到 1.15.2 后,开始正常运行了。 All the metrics along with custom metrics are getting exported properly.所有指标以及自定义指标都已正确导出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.