简体   繁体   English

Kafka不会在主题中均匀填充分区

[英]Kafka does not fill partitions evenly in a topic

I want to have 1 topic with 10 partitions. 我想要有10个分区的1个主题。 I am using default configuration of Kafka. 我正在使用Kafka的默认配置。 I create 1 topic with 10 paritions by that helper script and now I am about to produce messages to it. 我通过帮助程序脚本创建了10个包含10个分区的主题,现在我将要向它生成消息。

The thing is that it seems like there are only 5 partitions from which consumers fetches data. 问题是,似乎只有5个分区,消费者从中获取数据。

Let's describe it in more detail. 让我们更详细地描述一下。

I know that common stuff that you need one consumer thread per partition. 我知道每个分区需要一个消费者线程的常见内容。 I want to be able to commit offsets per partition and this is possible only when I have 1 thread per consumer connector per partition (I am using high level consumer). 我希望能够为每个分区提交偏移量,这只有在每个分区每个消费者连接器有1个线程时才有可能(我使用的是高级消费者)。

So I create 10 threads, in each thread I am calling Consumer.createJavaConsumerConnector() where I am doing this 所以我创建了10个线程,在每个线程中我调用的是Consumer.createJavaConsumerConnector(),我在这里做

topicCountMap.put("mytopic", 1);

and in the end I have 1 iterator which consumes messages from 1 partition. 最后我有1个迭代器,它消耗来自1个分区的消息。

When I do this 10 times, I have 10 consumers, consumer per thread per partition where I can commit offsets independently per partition because if I put different number from 1 in topic map, I would end up with more then 1 consumer thread for that topic so if I am about to commit offsets with created consumer instance, it would commit them for all threads which is not desired hence for multiple partitions which is not desired. 当我这样做10次时,我有10个消费者,每个分区的每个线程的消费者,我可以在每个分区独立提交偏移量,因为如果我在主题图中添加不同的数字1,那么我最终会有超过1个消费者线程用于该主题因此,如果我要使用已创建的使用者实例提交偏移量,它将为所有不需要的线程提交它们,因此对于不需要的多个分区。

But the thing is that when I use consumers, only 5 consumers are involved and it seems that other threads are idle but I do not know why. 但问题是,当我使用消费者时,只涉及5个消费者,似乎其他线程闲置但我不知道为什么。

The first possible reason is that even I have 10 partitions, only 5 partitions have messages so other 5 consumers are idle, but I do not understand how it is possible that messages are not spread evenly accross all partitions when I am using producers. 第一个可能的原因是,即使我有10个分区,只有5个分区有消息,所以其他5个消费者都处于闲置状态,但我不明白当我使用生产者时,消息在所有分区中的均匀分布是怎样的。 I am sending like 1M of messages so if it is said they are spread evenly, every paritition has to have at least some message in it. 我发送了大约1M的消息,所以如果说它们是均匀分布的,那么每个paritition 必须至少包含一些消息。

// EDIT //编辑

I managed to create 10 partitions in a topic but I have only 7 consumers. 我设法在一个主题中创建了10个分区,但我只有7个消费者。 That's just a miracle to me. 这对我来说只是一个奇迹。

The thing is that I am creating these consumer threads in a loop. 问题是我在循环中创建这些消费者线程。 So I start first thread (submit to executor service), then another, then another and so on. 所以我开始第一个线程(提交给执行者服务),然后是另一个,然后是另一个,依此类推。

So the scenario is that first consumer gets all 10 partitions, then 2nd connects so it is splits between these two to 5 and 5 (or something similar), then other threads are connecting. 所以场景是第一个消费者获得所有10个分区,然后是第二个连接,因此它在这两个分区之间分成5和5(或类似的东西),然后其他线程连接。

I understand this as a partition rebalancing among all consumers so it behave well in such sense that if more consumers are being created, partition balancing occurs between these consumers so every consumer should have some partitions to operate upon. 我将此理解为所有消费者之间的分区重新平衡,因此它在这种意义上表现良好,如果创建更多消费者,则在这些消费者之间进行分区平衡,因此每个消费者都应该有一些分区来操作。

But from the results I see that there is only 7 consumers and according to consumed messages it seems they are split like 3,2,1,1,1,1,1 partition-wise. 但从结果我看到只有7个消费者,根据消费消息,它们似乎分裂为3,2,1,1,1,1,1分区。 Yes, these 7 consumers covered all 10 partitions, but why consumers with more then 1 partition do no split and give partitions to remaining 3 consumers? 是的,这7个消费者覆盖了所有10个分区,但是为什么拥有1个以上分区的消费者不分割并为剩下3个消费者提供分区?

I am pretty much wondering what is happening with remaining 3 threads and why they do not "grab" partitions from consumers which have more then 1 partition assigned. 我非常想知道剩下的3个线程发生了什么,以及为什么他们没有从拥有超过1个分区的消费者“抓取”分区。

I've seen similar behavior when I (accidentally) access the topic programmatically before creating the topic via admin script. 当我(通过管理脚本)创建主题之前(意外地)以编程方式访问主题时,我看到了类似的行为。 In that situation, the number of partitions, as well as other topic configuration settings, defaults to the values in broker.config 在这种情况下,分区数以及其他主题配置设置默认为broker.config中的值

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用分区在 kafka -.81 中创建/更新主题 - creating/updating topic in kafka -.81 with partitions Kafka 消费者获得特定主题的分配分区 - Kafka Consumer get assigned partitions for a specific topic 是否可以将分区添加到 Kafka 0.8.2 中的现有主题 - Is it possible to add partitions to an existing topic in Kafka 0.8.2 如果某个消费者组订阅了多个主题分区,那么kafka如何确定它将首先读取的内容? - If a consumer group is subscribed to multiple topic partitions how does kafka decide which it will read first? 跨Kafka分区对消息进行排序并将其放入另一个Kafka主题中 - Sort messages across Kafka partitions and put it in another Kafka topic 为什么 kafka 流线程在源主题分区更改时死亡? 任何人都可以指出阅读材料吗? - Why does kafka streams threads die when the source topic partitions changes ? Can anyone point to reading material around this? Kafka使用者 - 消费者进程和线程与主题分区的关系是什么 - Kafka consumer - what's the relation of consumer processes and threads with topic partitions Kafka无法创建具有大量分区(64k)的主题 - Kafka fails to create Topic with large number of partitions (64k) 在Kafka中,是否可以从一个主题复制选择性分区? - In Kafka, Is it possible to have replication for selective partitions from one topic? Kafka:使用Java更改特定主题的分区数 - Kafka : Alter number of partitions for a specific topic using java
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM