简体   繁体   English

Kafka主题分区

[英]Kafka topic partitions

A quick question concerning Kafka's topic and partitioning. 关于Kafka主题和分区的快速问题。 Suppose to following scenario: 假设以下场景:

  • Producer1 writes data into Topic1. Producer1将数据写入Topic1。

  • Producer2 writes data into Topic2 Producer2将数据写入Topic2

  • Consumer1 reads data from Topic1 and Topic2. Consumer1从Topic1和Topic2读取数据。

  • Consumer2 reads data only from Topic2. Consumer2仅从Topic2读取数据。

The question is: how many partitions are there inside each Topic? 问题是:每个主题中有多少个分区? Is it true that it depends on the number of consumers to promote parallelism? 它是否真的取决于促进并行性的消费者数量? Or it's just a parameter set into the file server.config? 或者它只是一个参数设置到文件server.config? In the latter case, is there a way to have different topics with different number of partitions inside? 在后一种情况下,有没有办法让不同的主题内部有不同数量的分区?

The first thing to understand is that a topic partition is the unit of parallelism in Kafka. 首先要理解的是,主题分区是Kafka中并行性的单位。 On both the producer and the broker side, writes to different partitions can be done fully in parallel. 在生产者和代理端,对不同分区的写入可以完全并行完成。 On the consumer side, Kafka always gives a single partition's data to one consumer thread. 在消费者方面,Kafka总是将一个分区的数据提供给一个消费者线程。 Thus, the degree of parallelism in the consumer (within a consumer group) is bounded by the number of partitions being consumed. 因此,消费者(在消费者群体内)的并行度受到消费的分区数量的限制。 Therefore, in general, the more partitions there are in a Kafka cluster, the higher the throughput one can achieve. 因此,通常,Kafka群集中的分区越多,可以实现的吞吐量越高。

How many partitions are there inside each Topic? 每个主题中有多少个分区? That's configurable. 这是可配置的。 You can increase partition but once increased, you can not decrease it. 您可以增加分区,但一旦增加,您就无法减少分区。 Apache Kafka provides us with alter command to change Topic behavior and add/modify configurations. Apache Kafka为我们提供了alter命令来更改主题行为并添加/修改配置。 We will be using alter command to add more partitions to an existing Topic. 我们将使用alter命令为现有主题添加更多分区。

Here is the command to increase the partitions count for topic 'my-topic' to 20 - 这是将主题'my-topic'的分区数增加到20的命令 -

./bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic my-topic --partitions 20

You can verify whether partitions have been increased by using describe command as follows - 您可以使用describe命令验证是否已增加分区,如下所示 -

./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic my-topic

How many partitions you need to set for a topic? 您需要为主题设置多少个分区? Please read this well written document here: https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/ 请在此处阅读这份精心编写的文件: https//www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/

You can specify the number of partitions on topic creation. 可以指定主题创建的分区数。 For example, you have created Topic1 with 40 partitions. 例如,您已创建具有40个分区的Topic1 Now you start just one consumer. 现在你只开始一个消费者。 This consumer will be assigned to every partition of your Topic1 . 此消费者将被分配到您的Topic1的每个分区。

If you want to increase parallelism, you can start several consumers in a consumer group . 如果要增加并行度,可以在使用者组中启动多个使用者。 For example, starting 10 consumers with the same consumer group id leads to every consumer being assigned to approximately 4 partitions. 例如,启动具有相同消费者组ID的10个消费者导致每个消费者被分配到大约4个分区。

FYI starting more consumers (in a consumer group) than # partitions you have makes no sense - some consumers will be idle. 仅仅开始消费者(在消费者群体中)比消费者群体更多的消费者没有意义 - 一些消费者会闲着。

For more information take a look at the official Kafka documentation: https://kafka.apache.org/documentation/#intro_consumers 有关更多信息,请查看官方Kafka文档: https//kafka.apache.org/documentation/#intro_consumers

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM