简体   繁体   English

kafka主题中理想的分区数是多少?

[英]What is the ideal number of partitions in kafka topic?

I am learning Kafka and trying to create a topic for my recent search application.我正在学习 Kafka 并尝试为我最近的搜索应用程序创建一个主题。 The data being pushed to kafka topics is assumed be a high number.假设推送到 kafka 主题的数据数量很大。

My kafka cluster have 3 brokers and there are already topics created for other requirements.我的 kafka 集群有 3 个代理,并且已经为其他需求创建了主题。

Now what should be the number of partitions which i should choose for my recent search topic?现在我应该为我最近的搜索主题选择多少分区? And what if i do not provide the partition number explicitly?如果我没有明确提供分区号怎么办? What are things needs to be considered when choosing the partition number?选择分区号时需要考虑哪些事项?

This will depend on the throughput of your consumers.这将取决于您的消费者的吞吐量。 If you are producing 100 messages a second and your consumers can process 10 messages a second then you'll want at least 10 partitions (produce / consume) with 10 instances of your consumer.如果您每秒产生 100 条消息,而您的消费者每秒可以处理 10 条消息,那么您将需要至少 10 个分区(生产/消费)和 10 个消费者实例。 If you want this topic to be able to handle future growth, then you'll want to increase the partition count even higher so that you can add more instances of your consumer to handle the new volume.如果您希望该主题能够处理未来的增长,那么您将希望将分区计数增加得更高,以便您可以添加更多消费者实例来处理新卷。

Another piece of advice would be to make your partition count a highly divisible number so that you can scale up/down consumers while keeping their load balanced.另一个建议是让您的分区计数一个高度可整除的数字,以便您可以在保持负载平衡的同时扩大/缩小消费者。 For example, if you choose 10 partitions then you would have to have 1, 2, 5, or 10 instances of your consumer to keep them each processing from the same number of partitions.例如,如果您选择 10 个分区,那么您必须拥有 1、2、5 或 10 个消费者实例,以使它们每次处理都来自相同数量的分区。 If you choose 12 partitions instead then you could be balanced with either 1, 2, 3, 4, 6, or 12 instances of your consumer.如果您选择 12 个分区,那么您可以使用 1、2、3、4、6 或 12 个消费者实例来平衡。

I would consider evaluating two main things before deciding on the no of partitions.在决定分区数量之前,我会考虑评估两件主要事情。

  1. First point is, how the partitions, consumers of a consumer group act together.第一点是,消费者组的分区,消费者如何一起行动。 In simple words, One consumer can consume messages from more than one partitions but one partition can't be consumed by more than one consumer.简单来说,一个消费者可以消费来自多个分区的消息,但一个分区不能被多个消费者消费。 That means, it makes sense to have no.of partitions >= no.of consumers in a consumer group.这意味着,消费者组中的分区数 >= 消费者数是有意义的。 Otherwise you will end up having consumers without any partition is being assigned.否则,您最终将获得没有分配任何分区的消费者。

  2. Second point is, what's your requirement from latency vs throughout point of view.第二点是,从延迟与整个角度来看,您的要求是什么。 In simple words, Latency is the time required to perform some action or to produce some result.简单来说,延迟是执行某些操作或产生某些结果所需的时间。 Latency is measured in units of time -- hours, minutes, seconds, nanoseconds or clock periods.延迟以时间单位测量——小时、分钟、秒、纳秒或时钟周期。 Throughput is the number of such actions executed or results produced per unit of time吞吐量是每单位时间执行的此类操作或产生的结果的数量

Now, coming back to the comparison from kafka stand point, In general, more partitions in a Kafka cluster leads to higher throughput.现在,回到从 kafka 的角度进行比较,一般来说,Kafka 集群中的更多分区会导致更高的吞吐量。 But, you should be careful with this number if you are really looking for low latency.但是,如果你真的在寻找低延迟,你应该小心这个数字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM