简体   繁体   中英

Cost of Rebalancing partitions of a topic in Kafka

I am trying to come up with a design for consuming from Kafka. I am using 0.8.1.1 version of Kafka. I am thinking of designing a system where the consumer will be created every few seconds, consume the data from Kafka, process it and then quits after committing the offsets to Kafka. At any point of time expect 250 - 300 consumers to be active (running as ThreadPools in different machines).

  1. How and When a rebalance of partitions happens?

  2. How costly is the rebalancing of partitions among the consumers. I am expecting a new consumer finishing up or joining every few seconds to the same consumer group. So I just want to know the overhead and latency of a rebalancing operation.

  3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is processing a message M1 from Partition P1. Now Consumer C2 joins the group. How is the partitions divided between C1 and C2. Is there a possibility where C1's (which might take some time to commit its message to Kafka) commit for M1 will be rejected and M1 will be treated as a fresh message and will be delivered to someone else (I know Kafka is at least once delivery model but wanted to confirm if the re partition by any chance cause a re delivery of the same message)?

I'd rethink the design if I were you. Perhaps you need a consumer pool?

  1. Rebalancing happens every time a consumer joins or leaves the group.

  2. Kafka and the current consumer were definitely designed for long running consumers. The new consumer design (planned for 0.9) will handle short-lived consumers better. Re-balances takes 100-500ms in my experience, depending a lot on how ZooKeeper is doing.

  3. Yes, duplicates happen often during rebalancing. Thats why we try to avoid them. You can try to work around that by committing offsets more frequently, but with 300 consumers committing frequently and a lot of consumers joining and leaving - your Zookeeper may become a bottleneck.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM