简体繁体中英

Multiple Kafka Producers writing to the same topic - how to load balance consumption

原文 2020-05-20 18:38:29 3 1 apache-kafka/ kafka-consumer-api/ kafka-producer-api

So I have a design where I have multiple producers P1, P2, P3, P4... PN writing to a single topic T1, that has 32 partitions.

On the other side I have up to 32 consumers on a single consumer group.

I would like to load balance my message consumption.

Reading the docs I could see 3 options:
1. Define the partition myself (drawback I would have to know where the last message was sent or define a partition range for each Producer P)
2. Define a key and leave the partition decision to the Kafka hash algorithm (drawback - load balancing would be defined on luck)

(As per Chris answer the load balancing should be left to hash algorithm) -the reality shows this does not provide equal distribution to the consumers as the consumers are bound to partitions and I would have to understand the hash algorithm to chose a good key - which to me sound the same as picking the partition (and that would have to be distributed over the producers)

My current code is using UUID as the key. The analysis of the partitions chosen, and consequently the consumers working, shows a distribution that may be far from being equal. I'm reproducing it below:

The image above shows the number of messages received by each partitions in a 5 minutes window using UUID as my key - at that point in time I had 8 consumers. The consumption takes about 2 minutes. The cells in red shows a 9 request queue in one of the consumers, while other consumers had low loads - or zero load like the consumer in green. If a random key is not a good option, what should I chose?

No partition, no key and leave to the Kafka round robin algorithm (drawback the round robin is internal to the Producer - meaning all producers could be sending the message to the same partition - I also tested this option and the result is below:

The image above shows round robin is, apparently, internal to the producer.

Do I really need to write the overall load balancing algorithm myself? Am I missing something?

1 answers

Balancing load across consumers is one of the defining features of Kafka that allows horizontal scaling.

The record key used by the producer is what allows this to work. The key defines which partition the message goes on, and any partition will be consumed sequentially by one consumer, and so your producers should use a key strategy that produces an even spread and that ensures related messages have the same key if ordering is important (bear in mind there are other considerations around in flight requests if strict ordering is critical).

The former is what balances the load - there is no round-robin involved in consumers, partitions are just shared out as evenly as possible among consumers in each group and they poll independently. If keys are well spread then each partition will have about the same number of records.

So, to enable effective load balancing your only responsibility is to use a good strategy for creating message keys, and define your topics with at least as many partitions as you plan to scale out consumption to.

Kafka - Multiple producers writing to same topic and order of message is important

How to list producers writing to a certain kafka topic

Kafka ordering with multiple producers on same topic and parititon

Load balance Kafka record consumption when using topic pattern

IoT - multiple Kafka producers to publish messages to same topic

Is it acceptable to have multiple producers on different servers writing to the same topic?

How does Kafka achieve its parallelism with multiple consumption on the same topic same partition?

Kafka Producer (with multiple instance) writing to same topic

Kafka multiple producer writing to same topic?

Writing to same topic on kafka

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Kafka - Multiple producers writing to same topic and order of message is important How to list producers writing to a certain kafka topic Kafka ordering with multiple producers on same topic and parititon Load balance Kafka record consumption when using topic pattern IoT - multiple Kafka producers to publish messages to same topic Is it acceptable to have multiple producers on different servers writing to the same topic? How does Kafka achieve its parallelism with multiple consumption on the same topic same partition? Kafka Producer (with multiple instance) writing to same topic Kafka multiple producer writing to same topic? Writing to same topic on kafka

Related Tags

Multiple Kafka Producers writing to the same topic - how to load balance consumption

Question

1 answers

solution1 0 2020-05-23 16:28:08

solution1
0 2020-05-23 16:28:08