简体   繁体   English

将 Kafka Connect JMX 指标报告到 Datadog

[英]Getting Kafka Connect JMX metrics reporting into Datadog

I am working won a project involving Kafka Connect.我正在工作,赢得了一个涉及 Kafka Connect 的项目。 We have a Kafka Connect cluster running on Kubernetes with some Snowflake connectors already spun up and working.我们有一个在 Kubernetes 上运行的 Kafka Connect 集群,其中一些 Snowflake 连接器已经启动并正常工作。 The part we are having issues with now is trying to get the JMX metrics from the Kafka Connect cluster to report in Datadog.我们现在遇到问题的部分是尝试从 Kafka Connect 集群获取 JMX 指标以在 Datadog 中报告。 From my understanding of the Docs ( https://docs.confluent.io/home/connect/monitoring.html#using-jmx-to-monitor-kconnect ) the workers are already emitting metrics by default and we just need to find a way to get it reported to Datadog.根据我对文档的理解( https://docs.confluent.io/home/connect/monitoring.html#using-jmx-to-monitor-kconnect )默认情况下,工作人员已经在发出指标,我们只需要找到一个将其报告给 Datadog 的方法。

In our K8 Configmap we have these values set:在我们的 K8 Configmap 中,我们设置了以下值:

    CONNECT_KAFKA_JMX_PORT: "9095"
    KAFKA_JMX_PORT: "9095"
    JMX_PORT: "9095"

I have included this launch script where we are setting the KAFKA_JMX_PORT env var:我在我们设置 KAFKA_JMX_PORT 环境变量的地方包含了这个启动脚本:

export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=<redacted> -Dcom.sun.management.jmxremote.rmi.port=${JMX_PORT}"

I've been looking online and all over Stackoverflow and haven't actually seen an example of people getting JMX metrics reporting to Datadog and standing up a dashboard there so I was wondering if anyone had experience with this.我一直在网上和整个 Stackoverflow 上寻找,但实际上并没有看到人们将 JMX 指标报告给 Datadog 并在那里建立仪表板的例子,所以我想知道是否有人有这方面的经验。

Firstly, your Datadog agents need to have Java/JMX integration.首先,您的 Datadog 代理需要具有 Java/JMX 集成。

Secondly, use Datadog JMX integration with auto-discovery , where kafka-connect must match the container name.其次,使用Datadog JMX 与 auto-discovery 集成,其中kafka-connect必须匹配容器名称。

annotations:
  ad.datadoghq.com/kafka-connect.check_names: '["jmx"]'
  ad.datadoghq.com/kafka-connect.init_configs: '[{}]'
  ad.datadoghq.com/kafka-connect.instances: |
    [
      {
        "host": "%%host%%",
        "port": 9095,
        "conf": [
          {
            "include": {
              "domain": "kafka.connect",
              "type": "connector-task-metrics",
              "bean_regex": [
                "kafka.connect:type=connector-task-metrics,connector=.*,task=.*"
              ],
              "attribute": {
                "batch-size-max": {
                  "alias": "jmx.kafka.connect.connector.batch_size_max"
                },
                "status": {
                  "metric_type": "gauge",
                  "alias": "jmx.kafka.connect.connector.status",
                  "values": {
                    "running":0,
                    "paused":1,
                    "failed":2,
                    "destroyed":3,
                    "unassigned":-1
                  }
                },
                "batch-size-avg": {
                  "alias": "jmx.kafka.connect.connector.batch_size_avg"
                },
                "offset-commit-avg-time-ms": {
                  "alias": "jmx.kafka.connect.connector.offset_commit_avg_time"
                },
                "offset-commit-max-time-ms": {
                  "alias": "jmx.kafka.connect.connector.offset_commit_max_time"
                },
                "offset-commit-failure-percentage": {
                  "alias": "jmx.kafka.connect.connector.offset_commit_failure_percentage"
                }
              }
            }
          },
          {
            "include": {
              "domain": "kafka.connect",
              "type": "source-task-metrics",
              "bean_regex": [
                "kafka.connect:type=source-task-metrics,connector=.*,task=.*"
              ],
              "attribute": {
                "source-record-poll-rate": {
                  "alias": "jmx.kafka.connect.task.source_record_poll_rate"
                },
                "source-record-write-rate": {
                  "alias": "jmx.kafka.connect.task.source_record_write_rate"
                },
                "poll-batch-avg-time-ms": {
                  "alias": "jmx.kafka.connect.task.poll_batch_avg_time"
                },
                "source-record-active-count-avg": {
                  "alias": "jmx.kafka.connect.task.source_record_active_count_avg"
                },
                "source-record-write-total": {
                  "alias": "jmx.kafka.connect.task.source_record_write_total"
                },
                "source-record-poll-total": {
                  "alias": "jmx.kafka.connect.task.source_record_poll_total"
                }
              }
            }
          }
        ]
      }
    ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM