简体   繁体   中英

Kafka topic partitions

A quick question concerning Kafka's topic and partitioning. Suppose to following scenario:

  • Producer1 writes data into Topic1.

  • Producer2 writes data into Topic2

  • Consumer1 reads data from Topic1 and Topic2.

  • Consumer2 reads data only from Topic2.

The question is: how many partitions are there inside each Topic? Is it true that it depends on the number of consumers to promote parallelism? Or it's just a parameter set into the file server.config? In the latter case, is there a way to have different topics with different number of partitions inside?

The first thing to understand is that a topic partition is the unit of parallelism in Kafka. On both the producer and the broker side, writes to different partitions can be done fully in parallel. On the consumer side, Kafka always gives a single partition's data to one consumer thread. Thus, the degree of parallelism in the consumer (within a consumer group) is bounded by the number of partitions being consumed. Therefore, in general, the more partitions there are in a Kafka cluster, the higher the throughput one can achieve.

How many partitions are there inside each Topic? That's configurable. You can increase partition but once increased, you can not decrease it. Apache Kafka provides us with alter command to change Topic behavior and add/modify configurations. We will be using alter command to add more partitions to an existing Topic.

Here is the command to increase the partitions count for topic 'my-topic' to 20 -

./bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic my-topic --partitions 20

You can verify whether partitions have been increased by using describe command as follows -

./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-topic

How many partitions you need to set for a topic? Please read this well written document here: https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/

You can specify the number of partitions on topic creation. For example, you have created Topic1 with 40 partitions. Now you start just one consumer. This consumer will be assigned to every partition of your Topic1 .

If you want to increase parallelism, you can start several consumers in a consumer group . For example, starting 10 consumers with the same consumer group id leads to every consumer being assigned to approximately 4 partitions.

FYI starting more consumers (in a consumer group) than # partitions you have makes no sense - some consumers will be idle.

For more information take a look at the official Kafka documentation: https://kafka.apache.org/documentation/#intro_consumers

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM