Why is the monitored Kafka consumer lag value not always correct?

Question

I am running Apache Kafka on Kubernetes using minikube and I have also a Pod with a confluent consumer written in python. I am monitoring Kafka broker metrics with a JMX exporter and consumer metrics with kminion consumer exporter . These exporters are also 2 seperate pods. Lastly I have Prometheus monitoring both of these exporters and reading the metrics.

I am producing 2 messages per second to a certain topic. My consumer consumes a message and then runs a task. The task needs 0.4 seconds to complete. So I am also consuming with a rate of 2 messages per second.

My hypothesis is that the queue lag metric should be either zero or 2 at all times since I am producing and consuming at the same rate. I am monitoring the queue every second and this is what I get over a period of 5 seconds:

t = 0: Queue is 0.
t = 1: Queue is 3.
t = 2: Queue is 5.
t = 3: Queue is 7.
t = 4: Queue is 9.
t = 5: Queue is 0.

And it repeats the same cycle. So the avg_over_time of the queue lag is 5. Why is this happening? I know the consumer cannot consume 9 messages at once since it runs a task that takes 0.3 seconds to complete and therefore my maximum consume rate is 2 perSecond.

I have also tried using a different exporter for consumer metrics but I still get the same results.

Answer 1

When is your consumer committing the offsets?

If you are not manually committing offsets after processing each message, by default, a consumer commits them every 5 seconds: https://kafka.apache.org/documentation/#consumerconfigs_auto.commit.interval.ms

Which would explain why you see these queue values

Why is the monitored Kafka consumer lag value not always correct?

Question

1 answers

solution1
0 ACCPTED 2022-05-19 09:51:04

Why is the monitored Kafka consumer lag value not always correct?

Question

1 answers

solution1 0 ACCPTED 2022-05-19 09:51:04

solution1
0 ACCPTED 2022-05-19 09:51:04