简体   繁体   English

无法看到 Flink 自定义指标到普罗米修斯

[英]Not able to see flink custom metrics to Prometheus

I have a flink job written in scala and I am creating one custom metric to count the nmber of events in a stream. The job is deployed on kube.netes and I see system metrics of job-manager and task-managers in the prometheus.我有一个用 scala 编写的 flink 作业,我正在创建一个自定义指标来计算 stream 中的事件数量。该作业部署在 kube.netes 上,我在 prometheus 中看到了作业管理器和任务管理器的系统指标。 However, we don't see the custom metrics in prometheus though we see that in Flink UI.然而,我们在 prometheus 中看不到自定义指标,尽管我们在 Flink UI 中看到了。 Below is the custom metrics code:以下是自定义指标代码:

    val sampleProcessFunction = new ProcessFunction[String, String] {
    @transient private var counter: Counter = _
    override def open(parameters: Configuration): Unit =
      counter = getRuntimeContext.getMetricGroup.addGroup("abc").counter("streamcounter")

    override def processElement(
                                 value: String,
                                 ctx: ProcessFunction[String, String]#Context,
                                 out: Collector[String]): Unit = {
      
        val result = value.parseJson.toString
        counter.inc()
        out.collect(result)
      
    }
}

  

flink-config.yaml has these entries related to prometheus: flink-config.yaml 有这些与普罗米修斯相关的条目:

   taskmanager.network.detailed-metrics: true
   metrics.reporter.prom.class:org.apache.flink.metrics.prometheus.PrometheusReporter
   metrics.reporter.prom.port: 8080

Not only custom metrics, any taskmanager metrics that follows the path taskmanager.job.不仅是自定义指标,任何遵循路径taskmanager.job 的任务管理器指标。 * are not exposed in the metrics endpoint. * 未在指标端点中公开。 When I am getting into a taskmanager pod and doing a curl to the metrics endpoint like this:当我进入 taskmanager pod 并像这样对指标端点执行 curl 时:

kubectl exec -it flink-taskmanager-app-7448cdb787-9c48j -- /bin/bash
curl http://localhost:8080/metrics

I am only getting the status metrics related to taskmanager:我只得到与 taskmanager 相关的状态指标:

# HELP flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed MemoryUsed (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed gauge
flink_taskmanager_Status_JVM_Memory_Mapped_MemoryUsed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Flink_Memory_Managed_Used Used (scope: taskmanager_Status_Flink_Memory_Managed)
# TYPE flink_taskmanager_Status_Flink_Memory_Managed_Used gauge
flink_taskmanager_Status_Flink_Memory_Managed_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments UsedMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_UsedMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Network_TotalMemorySegments TotalMemorySegments (scope: taskmanager_Status_Network)
# TYPE flink_taskmanager_Status_Network_TotalMemorySegments gauge
flink_taskmanager_Status_Network_TotalMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_Shuffle_Netty_AvailableMemory AvailableMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_AvailableMemory gauge
flink_taskmanager_Status_Shuffle_Netty_AvailableMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.84252416E8
# HELP flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded ClassesLoaded (scope: taskmanager_Status_JVM_ClassLoader)
# TYPE flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded gauge
flink_taskmanager_Status_JVM_ClassLoader_ClassesLoaded{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 11075.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Max Max (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Max gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 2.68435456E8
# HELP flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage RequestedMemoryUsage (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage gauge
flink_taskmanager_Status_Shuffle_Netty_RequestedMemoryUsage{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments AvailableMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_AvailableMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Used Used (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Used gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 6.5252976E7
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Max Max (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Max gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 7.80140544E8
# HELP flink_taskmanager_Status_JVM_Memory_Direct_Count Count (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_Count gauge
flink_taskmanager_Status_JVM_Memory_Direct_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30065.0
# HELP flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity TotalCapacity (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity gauge
flink_taskmanager_Status_JVM_Memory_Direct_TotalCapacity{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.85225216E8
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time Time (scope: taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Threads_Count Count (scope: taskmanager_Status_JVM_Threads)
# TYPE flink_taskmanager_Status_JVM_Threads_Count gauge
flink_taskmanager_Status_JVM_Threads_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 51.0
# HELP flink_taskmanager_Status_Shuffle_Netty_TotalMemory TotalMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_TotalMemory gauge
flink_taskmanager_Status_Shuffle_Netty_TotalMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.84252416E8
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time Time (scope: taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 55.0
# HELP flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded ClassesUnloaded (scope: taskmanager_Status_JVM_ClassLoader)
# TYPE flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded gauge
flink_taskmanager_Status_JVM_ClassLoader_ClassesUnloaded{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Used Used (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Used gauge
flink_taskmanager_Status_JVM_Memory_Heap_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 1.56297264E8
# HELP flink_taskmanager_Status_JVM_CPU_Time Time (scope: taskmanager_Status_JVM_CPU)
# TYPE flink_taskmanager_Status_JVM_CPU_Time gauge
flink_taskmanager_Status_JVM_CPU_Time{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.001E10
# HELP flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed MemoryUsed (scope: taskmanager_Status_JVM_Memory_Direct)
# TYPE flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed gauge
flink_taskmanager_Status_JVM_Memory_Direct_MemoryUsed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.85225217E8
# HELP flink_taskmanager_Status_Shuffle_Netty_UsedMemory UsedMemory (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_UsedMemory gauge
flink_taskmanager_Status_Shuffle_Netty_UsedMemory{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count Count (scope: taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 3.0
# HELP flink_taskmanager_Status_JVM_Memory_Metaspace_Committed Committed (scope: taskmanager_Status_JVM_Memory_Metaspace)
# TYPE flink_taskmanager_Status_JVM_Memory_Metaspace_Committed gauge
flink_taskmanager_Status_JVM_Memory_Metaspace_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 6.7375104E7
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Max Max (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Max gauge
flink_taskmanager_Status_JVM_Memory_Heap_Max{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.429185024E9
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Committed Committed (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Committed gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.8787328E7
# HELP flink_taskmanager_Status_JVM_Memory_NonHeap_Used Used (scope: taskmanager_Status_JVM_Memory_NonHeap)
# TYPE flink_taskmanager_Status_JVM_Memory_NonHeap_Used gauge
flink_taskmanager_Status_JVM_Memory_NonHeap_Used{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 9.4597576E7
# HELP flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments TotalMemorySegments (scope: taskmanager_Status_Shuffle_Netty)
# TYPE flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments gauge
flink_taskmanager_Status_Shuffle_Netty_TotalMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_Flink_Memory_Managed_Total Total (scope: taskmanager_Status_Flink_Memory_Managed)
# TYPE flink_taskmanager_Status_Flink_Memory_Managed_Total gauge
flink_taskmanager_Status_Flink_Memory_Managed_Total{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.294967296E9
# HELP flink_taskmanager_Status_JVM_CPU_Load Load (scope: taskmanager_Status_JVM_CPU)
# TYPE flink_taskmanager_Status_JVM_CPU_Load gauge
flink_taskmanager_Status_JVM_CPU_Load{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.002796347271376764
# HELP flink_taskmanager_Status_JVM_Memory_Mapped_Count Count (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_Count gauge
flink_taskmanager_Status_JVM_Memory_Mapped_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_Memory_Heap_Committed Committed (scope: taskmanager_Status_JVM_Memory_Heap)
# TYPE flink_taskmanager_Status_JVM_Memory_Heap_Committed gauge
flink_taskmanager_Status_JVM_Memory_Heap_Committed{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 4.429185024E9
# HELP flink_taskmanager_Status_Network_AvailableMemorySegments AvailableMemorySegments (scope: taskmanager_Status_Network)
# TYPE flink_taskmanager_Status_Network_AvailableMemorySegments gauge
flink_taskmanager_Status_Network_AvailableMemorySegments{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 30037.0
# HELP flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity TotalCapacity (scope: taskmanager_Status_JVM_Memory_Mapped)
# TYPE flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity gauge
flink_taskmanager_Status_JVM_Memory_Mapped_TotalCapacity{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0
# HELP flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count Count (scope: taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation)
# TYPE flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count gauge
flink_taskmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count{host="172_26_129_190",tm_id="172_26_129_190:6122_3b582c",} 0.0

Note: No explicit filter/exclusion configured in the config file.注意:配置文件中没有配置明确的过滤器/排除。
Can anybody please help how can we get the taskmanager.job.任何人都可以帮助我们如何获得taskmanager.job。 * metrics including custom metrics? * 指标包括自定义指标?

Can you share more of the custom metric code you've used, so we can see it more in context?您能否分享更多您使用过的自定义指标代码,以便我们可以在上下文中看到更多? What you've shared so far isn't obviously correct.到目前为止,您分享的内容显然不正确。 Or the problem could be related to the metric name -- how are you looking for it in Prometheus?或者问题可能与指标名称有关——您如何在 Prometheus 中查找它?

You'll find a working example of a custom metric in https://docs.immerok.cloud/docs/how-to-guides/development/measuring-latency/ .您将在https://docs.immerok.cloud/docs/how-to-guides/development/measuring-latency/中找到自定义指标的工作示例。

Note: I work for Immerok.注意:我为 Immerok 工作。

Update:更新:

It sounds like the problem is probably on the Prometheus side of things.听起来问题可能出在普罗米修斯方面。

A couple of things to check:要检查的几件事:

  1. I assume you have a line in the config that says metrics.reporters: prom , but didn't share it above.我假设您在配置中有一行显示metrics.reporters: prom ,但上面没有分享。
  2. Make sure you haven't configured Flink to exclude some metrics from being sent to Prometheus.确保您没有配置 Flink 以排除某些指标被发送到 Prometheus。 (If the config you shared above is complete, then this isn't the problem.) (如果您在上面共享的配置是完整的,那么这不是问题。)
  3. Check the Prometheus configuration to see which metrics it is scraping (and how often).检查 Prometheus 配置以查看它正在抓取哪些指标(以及频率)。

https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html is a bit dated, but still relevant. https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html有点过时,但仍然相关。

Note: While this isn't the problem, configuring metrics reporters via their class, as with metrics.reporter.prom.class , has been obsolete for a long time and was deprecated in Flink 1.16.注意:虽然这不是问题,但通过他们的 class 配置指标报告器,与metrics.reporter.prom.class ,已经过时了很长时间,并在 Flink 1.16 中被弃用。 This will be removed in 1.17.这将在 1.17 中删除。 This line can be updated to此行可以更新为

metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory

The issue was with the flink version.问题出在 flink 版本上。 I was using 1.15.0 which has a reported bug on metrics https://lists.apache.org/thread/6bd9vmcroh7576d7h1kdcd8czf0b4l73我使用的是 1.15.0,它在指标https://lists.apache.org/thread/6bd9vmcroh7576d7h1kdcd8czf0b4l73上报告了一个错误

Basically, when a job runs, taskmanager metrics related to taskmanager.job.基本上,当作业运行时,taskmanager 指标与taskmanager.job 相关。 * disappears. * 消失。 After upgrading flink to 1.15.2, it started working properly. flink 升级到 1.15.2 后,开始正常运行了。 All the metrics along with custom metrics are getting exported properly.所有指标以及自定义指标都已正确导出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Prometheus 配置以忽略对 Kubernetes 中特定命名空间的指标的抓取 - Prometheus config to ignore scraping of metrics for a specific namespace in Kubernetes 向 prometheus 抓取请求添加自定义参数 - Add custom params to prometheus scrape request PhpMyAdmin配置/无法看到输入文本字段 - PhpMyAdmin Configuration / Not able to see Input text field 有没有办法让 spring 启动应用程序能够在资源中查看配置文件夹并在需要时获取其配置? - Is there is a way to make a spring boot app to be able to see a config folder in Resources and get its config whenever it needs? Symfony2的。 没有可以加载配置的扩展。 自定义配置块 - Symfony2. There is no extension able to load the configuration. Custom configuration block 如何在稳定/普罗米修斯图表中设置prometheus规则.yaml? - How to set prometheus rules in stable/prometheus chart values.yaml? Prometheus scrap_config: target 包含一个“/”,但 Prometheus 不接受 - Prometheus scrap_config: target contains a "/", but Prometheus does not accept that Prometheus scrape_timeout 的使用 - Use of Prometheus scrape_timeout 将 Flink 1.10 升级到 Flink 1.11(kubernetes 部署上的 Log4j) - Upgrade Flink 1.10 to Flink 1.11 (Log4j on kubernetes deployment) 普罗米修斯警报管理器“收件人”字段中的多个电子邮件接收器 - Multiple email receivers in prometheus alertmanager "to" field
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM