简体   繁体   English

Apache Kafka:寻求和分配。 从头开始可靠阅读

[英]Apache Kafka: seek and assignment. Reliable read from beginning

I'm trying to implement basic scenario, re-read topic from beginning (at least 1 message), and I'm facing unexpected behavior. 我正在尝试实现基本场景,从头开始重新阅读主题(至少1条消息),我面临着意想不到的行为。

Suppose, there are 1 partition topic holding exactly 1 million messages, 1 consumer with offset already committed somewhere at middle, no active producers. 假设有1个分区主题正好保存了100万条消息,1个消费者已经在中间某处提交了偏移量,没有活跃的生产者。

First I've tried 首先我试过了

  consumer.subscribe(Collections.singletonList(topic));
  consumer.seekToBeginning(Collections.emptySet());
  consumer.poll(Duration.ofMillis(longTimeout)); //no loop to simplify

And that doesn't work (no messages polled). 这不起作用(没有查询消息)。 I've read that seekToBeginning is lazy (and that's ok), but it turns out, seekToBeginning doesn't impact at all, cause it need partitions to be already assigned, what will only happen with first poll. 我已经读过seekToBeginning是懒惰的(这没关系),但事实证明, seekToBeginning完全没有影响,因为它需要已经分配了分区,只有在第一次轮询时才会发生。 Should it be described at docs, or have I missed it? 应该在docs上描述,还是我错过了?

Then I've tried 然后我试过了

  consumer.subscribe(Collections.singletonList(topic));
  consumer.poll(Duration.ofMillis(assignTimeout));
  consumer.seekToBeginning(Collections.emptySet());
  consumer.poll(Duration.ofMillis(longTimeout));//no loop to simplify

And turns out, it depends on assignTimeout . 事实证明,这取决于assignTimeout It should be enough to complete join process. 它应该足以完成加入过程。 That time may vary and it's not possible to rely on it. 那个时间可能会有所不同,也不可能依赖它。

Then I've provided ConsumerRebalanceListener with 然后我提供了ConsumerRebalanceListener

    @Override
    public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
      consumer.seekToBeginning(partitions);
    }

And single poll left. 单身poll离开了。 And it finally seems to work. 它似乎终于有效了。

So the questions are: 所以问题是:

  1. Is seekToBeginning right after subscribe useless? subscribe后, seekToBeginning是否无用? Should it be documented? 它应该记录在案吗?
  2. Is solution with ConsumerRebalanceListener reliable? ConsumerRebalanceListener解决方案是否可靠? Does it guarantee that no messages from middle (committed offset) would be polled before seek will apply? 它是否保证在搜索适用之前不会对来自中间(承诺偏移)的消息进行轮询?

For the first one : 对于第一个

You've rightly mentioned this in your question that the pre-requisite for seek() or seekToXXXX() operations is that the partitions need to be assigned. 您在问题中正确地提到过, seek()seekToXXXX()操作的先决条件是需要分配分区。 This will not happen until we join a consumer group and this will happen only if we call poll() . 直到我们加入一个消费者群体才会发生这种情况,只有当我们调用poll()才会发生这种情况。 So, seek() operation not working immediately after the subscribe() is the expected behaviour. 因此,在subscribe()是预期的行为之后, seek()操作不会立即工作。

This is actually documented in Kafka's Definitive Guide, Chapter 4 Kafka Consumers, Section - Consuming Records with Specific Offsets. 这实际上记录在Kafka的权威指南,第4章Kafka消费者,章节 - 使用特定抵消的消费记录中。

For the second question : 对于第二个问题

Yes, using ConsumerRebalanceListener is reliable and is a recommended approach as per Kafka's Definitive Guide. 是的,使用ConsumerRebalanceListener是可靠的,并且是根据Kafka的权威指南推荐的方法。

Here's the statement from the same chapter that confirms the same: 以下是同一章中的陈述,证实了相同:

There are many different ways to implement exactly-once semantics ..................., but all of them will need to use the ConsumerRebalance Listener and seek() to make sure offsets are stored in time and that the consumer starts reading messages from the correct location. 有许多不同的方法可以实现一次性语义...................但是所有这些方法都需要使用ConsumerRebalance Listener和seek()来确保偏移量存储在时间中并且消费者开始从正确的位置读取消息。

Hope this helps! 希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM