简体   繁体   English

多个 Kafka 生产者写入同一主题 - 如何负载平衡消费

[英]Multiple Kafka Producers writing to the same topic - how to load balance consumption

So I have a design where I have multiple producers P1, P2, P3, P4... PN writing to a single topic T1, that has 32 partitions.所以我有一个设计,我有多个生产者 P1、P2、P3、P4... PN 写入具有 32 个分区的单个主题 T1。

On the other side I have up to 32 consumers on a single consumer group.另一方面,我在一个消费者组中最多有 32 个消费者。

I would like to load balance my message consumption.我想对我的消息消耗进行负载平衡。

Reading the docs I could see 3 options:阅读文档我可以看到 3 个选项:
1. Define the partition myself (drawback I would have to know where the last message was sent or define a partition range for each Producer P) 1. 自己定义分区(缺点我必须知道最后一条消息是在哪里发送的或者为每个Producer P定义一个分区范围)
2. Define a key and leave the partition decision to the Kafka hash algorithm (drawback - load balancing would be defined on luck) 2. 定义一个键并将分区决策留给 Kafka hash 算法(缺点 - 负载平衡将根据运气定义)

(As per Chris answer the load balancing should be left to hash algorithm) -the reality shows this does not provide equal distribution to the consumers as the consumers are bound to partitions and I would have to understand the hash algorithm to chose a good key - which to me sound the same as picking the partition (and that would have to be distributed over the producers) (根据克里斯的回答,负载平衡应该留给 hash 算法) - 现实情况表明,这并没有为消费者提供平等的分配,因为消费者绑定到分区,我必须了解 hash 算法才能选择一个好的密钥 -对我来说,这听起来与选择分区相同(并且必须分配给生产者)

My current code is using UUID as the key.我当前的代码使用 UUID 作为键。 The analysis of the partitions chosen, and consequently the consumers working, shows a distribution that may be far from being equal.对所选分区以及消费者工作的分析表明,分布可能远非相等。 I'm reproducing it below:我在下面复制它:

分区收到的消息 The image above shows the number of messages received by each partitions in a 5 minutes window using UUID as my key - at that point in time I had 8 consumers.上图显示了每个分区在 5 分钟内收到的消息数量 window 使用 UUID 作为我的键 - 在那个时间点我有 8 个消费者。 The consumption takes about 2 minutes.消耗大约需要2分钟。 The cells in red shows a 9 request queue in one of the consumers, while other consumers had low loads - or zero load like the consumer in green.红色单元格显示其中一个消费者中有 9 个请求队列,而其他消费者的负载较低 - 或者像绿色消费者一样为零负载。 If a random key is not a good option, what should I chose?如果随机密钥不是一个好的选择,我应该选择什么?

  1. No partition, no key and leave to the Kafka round robin algorithm (drawback the round robin is internal to the Producer - meaning all producers could be sending the message to the same partition - I also tested this option and the result is below:没有分区,没有密钥,留给 Kafka 循环算法(缺点循环是生产者内部的 - 这意味着所有生产者都可以将消息发送到同一个分区 - 我也测试了这个选项,结果如下:

循环是生产者内部的 The image above shows round robin is, apparently, internal to the producer.上图显示循环显然是生产者内部的。

Do I really need to write the overall load balancing algorithm myself?我真的需要自己编写整体负载均衡算法吗? Am I missing something?我错过了什么吗?

Balancing load across consumers is one of the defining features of Kafka that allows horizontal scaling.跨消费者平衡负载是 Kafka 的定义功能之一,它允许水平扩展。

The record key used by the producer is what allows this to work.生产者使用的记录密钥允许它工作。 The key defines which partition the message goes on, and any partition will be consumed sequentially by one consumer, and so your producers should use a key strategy that produces an even spread and that ensures related messages have the same key if ordering is important (bear in mind there are other considerations around in flight requests if strict ordering is critical).键定义了消息在哪个分区上进行,任何分区都将由一个消费者按顺序使用,因此您的生产者应该使用一种产生均匀分布的键策略,并确保相关消息在排序很重要时具有相同的键(熊请记住,如果严格订购至关重要,则在飞行请求中还有其他考虑因素)。

The former is what balances the load - there is no round-robin involved in consumers, partitions are just shared out as evenly as possible among consumers in each group and they poll independently.前者是平衡负载的方式——消费者中不涉及循环,分区只是在每个组中的消费者之间尽可能均匀地共享,并且它们独立轮询。 If keys are well spread then each partition will have about the same number of records.如果键分布良好,则每个分区将具有大致相同数量的记录。

So, to enable effective load balancing your only responsibility is to use a good strategy for creating message keys, and define your topics with at least as many partitions as you plan to scale out consumption to.因此,要实现有效的负载平衡,您唯一的责任就是使用一个好的策略来创建消息键,并使用至少与您计划将消费扩展到的分区一样多的分区来定义您的主题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM