简体   繁体   中英

Apache Kafka Scaling Topics using partitions

We started to use Apache Kafka to persist Timeseries data into a Timeseries database. What we started with was to just have a single topic, a producer writing to this topic and a single consumer reading from this topic and dumping the data to the Timeseries database.

We had 3 broker instances and what we noticed in the first try was that the producer was pretty fast in writing messages to the topic. Within a matter of 30 minutes, we had around 1.5 million messages. The consumer was just doing 300 messages per second.

Our next approach was to partition the topic and have more consumer instances (equal to the number of partitions). This definitely improved on the consumer write speed. Now my questions are:

  1. What happens if I set my topic partition to 6, but I have only 3 broker instances. Which broker instance would be the leader for partition 1 to 6?

  2. Is there a formula to determine how many partitions would I be needing? Since this was our test environment, we could play with it and scale it. We might not be able to do the same on our production environment. So how to determine the partition size?

The partitions get distributed amongst your brokers. It's impossible to know which broker will be elected leader of a given partition -- and it can change over time. Depending on which version of Kafka and which Consumer API you use, your consumer may or may not discover partition leaders on its own. With the SimpleConsumer you have to find partition leaders on your own, and respond to new leader election in your code (instead of having it handled by the API automatically).

As to the number of partitions -- there's no real "formula" other than this: you can have no more parallelism than you have partitions. If you have 4 partitions and 5 consumers, one of the consumers will starve. I usually use numbers like 12 or 60 or multiples thereof for the number of partitions for large topics. Something that divides easily and cleanly among variable numbers of consumers.

Also, note that you can later on change the number of partitions, with some caveats. See this answer for how and what the caveats are.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM