简体   繁体   中英

How to handle session timeout while processing Kafka messages?

I am processing messages from Kafka in a standard processing loop:

  while (true) {
     ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
     for (ConsumerRecord<String, String> record : records) {
         processMessage(record);
     }
  }

What should I do if my Kafka Consumer gets into a timeout while processing the records? I mean the timeout controlled by the property session.timeout.ms When this happens, my consumer should stop processing the records, because it would lose its partitions and the records that it processes could be already processed by another consumer. If the original consumer writes some processing results into a database, it could overwrite the records produced by the "new" consumer that got the partitions after my original consumer timed out.

I know about the ConsumerRebalanceListener, but from my understanding its method onPartitionsLost would only be called after I call the poll method from the consumer. Therefore this doesn't help me to stop the processing loop of the batch of records that I received from the previous poll.

I would expect that the heartbeat thread could notify me that it was not able to contact the broker and that we have a session timeout in the consumer, but there doesn't seem to be anything like that... Am I missing something?

Adding this as an answer as it would be too long in a comment.

Kafka has a few ways that can be used to process messages

  • At most once;
  • At least once; and
  • Exactly once.

You are describing that you would like to use kafka as exactly once semantics (which by the way is the least common way of using kafka). Also producers need to play nicely as by default kafka can produce the same message more than once.

It's a lot more common to build services that use the at least once mechanism, in this way you can receive (or process) the same message more than once but you need to have a way to deduplicate them (it's the same idea behind idempotency on http APIs). You'll need to have something in the message that is unique and have register that that id has been processed already. If the payload has nothing you can use to deduplicate them, you can add a header on the message and use that.

This is also useful in the scenario that you have to reset the offset, so the service can go through old messages without breaking.

I would suggest you to google a bit for details on how to implement the above. Here's a blog post from confluent about developing exactly once semantics Improved Robustness and Usability of Exactly-Once Semantics in Apache Kafka and the Kafka docs explaining the different semantics .

About the point of the ConsumerRebalanceListener , you don't need to do anything if you follow the solution of using idempotency in the consumer. Rebalances also happen when an app crashes, and in that scenario the service might have processed some records, but not committed them yet to Kafka.

A mini tip I give to everyone who is starting with Kafka. Kafka looks simple from the outside but it's a complex technology. Don't use it in production until you know the nitty gritty details of how it works including have done some good amount of negative testing (unless you are ok with losing data).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM