简体   繁体   English

在Kafka Consumer中获取重复项

[英]Getting duplicates in kafka consumer

I am writing a Java client for Kafka consumer.I commit every messages asynchronously before processing it.Still I am receiving many duplicate messages during rebalance. 我正在为Kafka消费者编写Java客户端。我在处理它之前异步提交每个消息。仍然在重新平衡期间收到许多重复的消息。

Can anyone explain the reason and how to avoid this? 谁能解释原因以及如何避免这种情况?

Kafka Consumer does not provide exactly-once processing guarantees, even if you commit all messages synchronously . 即使您同步提交所有消息,Kafka Consumer也不提供一次精确的处理保证。

The problem is, that when you did finish processing a message successfully and want to commit it, the rebalance can happen right before the commit. 问题是,当您确实成功完成了对消息的处理并想要提交时,重新平衡可能会在提交之前发生。 Thus, your commit is not done and the already processed message will be reprocessed. 因此,您的提交未完成,并且已经处理的消息将被重新处理。

Because you use asynchronous commits, the number of duplicates increases, as committing does not happen immediately for each single message. 因为您使用异步提交,所以重复的数量增加,因为不会立即对每条消息进行提交。 Hence, you can have many messages "in-flight" that are finished processing but not committed yet. 因此,您可以有许多“正在进行中”的消息正在处理中,但尚未提交。 On rebalance, all "in-flight" message will be re-processed. 重新平衡时,所有“飞行中”的消息将被重新处理。

So committing synchronously will reduce the number of duplicates. 因此, 同步提交将减少重复项的数量。 However, duplicates cannot be avoided completely, because there is no exactly-once delivery guarantee in Kafka. 但是,不能完全避免重复,因为Kafka中没有确切的一次交货保证。

Exactly-once delivery is on the roadmap for future release of Kafka though: https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging 不过,Kafka的未来发布路线图上仅是一次交付: https : //cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM