简体   繁体   中英

Getting Kafka Connect JMX metrics reporting into Datadog

I am working won a project involving Kafka Connect. We have a Kafka Connect cluster running on Kubernetes with some Snowflake connectors already spun up and working. The part we are having issues with now is trying to get the JMX metrics from the Kafka Connect cluster to report in Datadog. From my understanding of the Docs ( https://docs.confluent.io/home/connect/monitoring.html#using-jmx-to-monitor-kconnect ) the workers are already emitting metrics by default and we just need to find a way to get it reported to Datadog.

In our K8 Configmap we have these values set:

    CONNECT_KAFKA_JMX_PORT: "9095"
    KAFKA_JMX_PORT: "9095"
    JMX_PORT: "9095"

I have included this launch script where we are setting the KAFKA_JMX_PORT env var:

export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=<redacted> -Dcom.sun.management.jmxremote.rmi.port=${JMX_PORT}"

I've been looking online and all over Stackoverflow and haven't actually seen an example of people getting JMX metrics reporting to Datadog and standing up a dashboard there so I was wondering if anyone had experience with this.

Firstly, your Datadog agents need to have Java/JMX integration.

Secondly, use Datadog JMX integration with auto-discovery , where kafka-connect must match the container name.

annotations:
  ad.datadoghq.com/kafka-connect.check_names: '["jmx"]'
  ad.datadoghq.com/kafka-connect.init_configs: '[{}]'
  ad.datadoghq.com/kafka-connect.instances: |
    [
      {
        "host": "%%host%%",
        "port": 9095,
        "conf": [
          {
            "include": {
              "domain": "kafka.connect",
              "type": "connector-task-metrics",
              "bean_regex": [
                "kafka.connect:type=connector-task-metrics,connector=.*,task=.*"
              ],
              "attribute": {
                "batch-size-max": {
                  "alias": "jmx.kafka.connect.connector.batch_size_max"
                },
                "status": {
                  "metric_type": "gauge",
                  "alias": "jmx.kafka.connect.connector.status",
                  "values": {
                    "running":0,
                    "paused":1,
                    "failed":2,
                    "destroyed":3,
                    "unassigned":-1
                  }
                },
                "batch-size-avg": {
                  "alias": "jmx.kafka.connect.connector.batch_size_avg"
                },
                "offset-commit-avg-time-ms": {
                  "alias": "jmx.kafka.connect.connector.offset_commit_avg_time"
                },
                "offset-commit-max-time-ms": {
                  "alias": "jmx.kafka.connect.connector.offset_commit_max_time"
                },
                "offset-commit-failure-percentage": {
                  "alias": "jmx.kafka.connect.connector.offset_commit_failure_percentage"
                }
              }
            }
          },
          {
            "include": {
              "domain": "kafka.connect",
              "type": "source-task-metrics",
              "bean_regex": [
                "kafka.connect:type=source-task-metrics,connector=.*,task=.*"
              ],
              "attribute": {
                "source-record-poll-rate": {
                  "alias": "jmx.kafka.connect.task.source_record_poll_rate"
                },
                "source-record-write-rate": {
                  "alias": "jmx.kafka.connect.task.source_record_write_rate"
                },
                "poll-batch-avg-time-ms": {
                  "alias": "jmx.kafka.connect.task.poll_batch_avg_time"
                },
                "source-record-active-count-avg": {
                  "alias": "jmx.kafka.connect.task.source_record_active_count_avg"
                },
                "source-record-write-total": {
                  "alias": "jmx.kafka.connect.task.source_record_write_total"
                },
                "source-record-poll-total": {
                  "alias": "jmx.kafka.connect.task.source_record_poll_total"
                }
              }
            }
          }
        ]
      }
    ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM