简体   繁体   中英

How does a single Kafka consumer, consuming from two different partitions (same topic), keeps track of each partition offset?

My doubt is:

  • Let's say I have one consumer, consuming from two partitions at the same time.
  • You can assume it is the sole consumer from a consumer group, consuming from a topic with two partitions.
  • For some reason the consumer is slow consuming from partition 0 and fast consuming from partition 1.

How does the Kafka Consumer API handle the different offsets from each partition? Let's say I restart my consumer now. How does my consumer know the offsets from each partition so that it can resume where it left off?

An example would be great to shed some light on this matter. Any link with code demonstrating this scenario will be greatly appreciated.

Thanks!

The consumer offset is a way of tracking the sequential order in which messages are received by Kafka topics. Keeping track of the offset, or position, is important for nearly all Kafka use cases and can be an absolute necessity in certain instances, such as financial services.

The Kafka consumer offset allows processing to continue from where it last left off if the stream application is turned off or if there is an unexpected failure. In other words, by having the offsets persist in a data store (Kafka and/or ZooKeeper), data continuity is retained even when the stream application shuts down or fails.

Kafka internally maintains numerous internal managed topics like __consumer_offsets etc, eventually, topics with prefix __ in the topic list of clusters will give essence.

Determining Kafka Consumer Offset

New Consumer Groups

Initially, when a Kafka consumer starts for a new topic, the offset begins at zero (0). Easy enough.

On the other hand, if a new consumer group is started in an existing topic, then there is no offset store. In this scenario, the offset will either begin from the beginning of a topic or the end of the topic. The beginning of a topic would give the smallest possible offset. The end of the topic would be the greatest possible offset.

Whether you start at the beginning or end of a topic is determined by your use case. If you start the offset at the beginning of a topic, then you will be replaying data. This approach is good for building out a new server and populating it with data, or for doing load testing on a Kafka cluster. If your needs don't require any of those functions, then you likely will want to start at the end of the topic.

Existing Consumer Groups

What about for existing consumer groups? Let's say for instance that a consumer group consumes 12 messages before failing. When the consumer starts up again, it will continue from where it left off in the offset (or position) because that offset is stored by Kafka and/or ZooKeeper.(Depends on the kafka version you are using in the cluster ) Reference:- https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum

If you are ever curious about where the offset is at, you can open the kafka-consumer-groups tool. This tool will provide you with both the offsets and lag of consumers for the various topics and partitions. Keep in mind that the consumer has to be active when you run this command to see its current offset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM