简体   繁体   中英

How to scale Kafka stream processing dynamically?

I have a fixed number of partitions of a topic. Producers produce data at varying rate in different hours of the day.

I want to add consumers dynamically based on hours of the day for the processing so that I can process records as fast as I can.

For example I have 10 partitions of a topic. I want to deploy 5 consumers for non peak hours and 20 consumers for peak hours.

My problem is that when I will have 20 consumers, each consumer will be receiving duplicate records, which I want to avoid. I want to process unique records only to speed-up records processing.

Is there any mechanism to do this?

If you have N partitions, then you can have up to N consumers within the same consumer group each of which reading from a single partition. When you have less consumers than partitions, then some of the consumers will read from more than one partition. Also, if you have more consumers than partitions then some of the consumers will be inactive and will receive no messages at all.

Therefore, if you want to kick off 20 consumers, you need to increase the number of partitions of the topic to -at least- 20 otherwise, 10 of your consumers will be inactive.

With regards to the duplicates that you've mentioned, if all of your consumers belong to the same group, then each message will be consumed only once.

To summarise,

  1. Increase the number of partitions of your topic to 20.
  2. Create the mechanism that will be creating and killing consumers based on peak/off-peak hours and make sure that when you kick of a consumer, it belongs to the existing consumer group so that the messages are consumed only once.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM