简体   繁体   中英

Getting duplicates in kafka consumer

I am writing a Java client for Kafka consumer.I commit every messages asynchronously before processing it.Still I am receiving many duplicate messages during rebalance.

Can anyone explain the reason and how to avoid this?

Kafka Consumer does not provide exactly-once processing guarantees, even if you commit all messages synchronously .

The problem is, that when you did finish processing a message successfully and want to commit it, the rebalance can happen right before the commit. Thus, your commit is not done and the already processed message will be reprocessed.

Because you use asynchronous commits, the number of duplicates increases, as committing does not happen immediately for each single message. Hence, you can have many messages "in-flight" that are finished processing but not committed yet. On rebalance, all "in-flight" message will be re-processed.

So committing synchronously will reduce the number of duplicates. However, duplicates cannot be avoided completely, because there is no exactly-once delivery guarantee in Kafka.

Exactly-once delivery is on the roadmap for future release of Kafka though: https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM