简体   繁体   中英

Apache Kafka Message broadcasting

I am studying Apache-kafka and have some confusion. Please help me to understand the following scenario.

I have a topic with 5 partitions and 5 brokers in a Kafka cluster. I am maintaining my message order in Partition 1(say P1).I want to broadcast the messages of P1 to 10 consumers.

So my question is; how do these 10 consumers interact with topic partition p1.

This is probably not how you want to use Kafka.

Unless you're being explicit with how you set your keys, you can't really control which partition your messages end up in when producing to a topic. Partitions in Kafka are designed to be more like low-level plumbing, something that exists, but you don't usually have to interact with. On the consumer side, you will be assigned partitions based on how many consumers are active for a particular consumer group at any one time.

One way to get around this is to define a topic to have only a single partition, in which case, of course, all messages will go to that partition. This is not ideal, since Kafka won't be able to parallelize data ingestion or serving, but it is possible.

So, having said that, let's assume that you did manage to put all your messages in partition 1 of a specific topic. When you fire up a consumer of that topic with consumer group id of consumer1 , it will be assigned all the partitions for that topic, since that consumer is the only active one for that particular group id. If there is only one partition for that topic, like explained above, then that consumer will get all the data. If you then fire up a second consumer with the same group id, Kafka will notice there's a second consumer for that specific group id, but since there's only one partition, it can't assign any partitions to it, so that consumer will never get any data.

On the other hand, if you fire up a third consumer with a different consumer group id, say consumer2 , that consumer will now get all the data, and it won't interfere at all with consumer1 message consumption, since Kafka keeps track of their consuming offsets separately. Kafka keeps track of which offset each particular ConsumerGroupId is at on each partition, so it won't get confused if one of them starts consuming slowly or stops for a while and restarts consuming later that day.

Much more detailed information here on how Kafka works here: https://kafka.apache.org/documentation/#gettingStarted

And more information on how to use the Kafka consumer at this link: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

@mjuarez's answer is absolutely correct - just for brevity I would reduce it to the following;

Don't try and read only from a single partition because it's a low level construct and it somewhat undermines the parallelism of Kafka. You're much better off just creating more topics if you need finer separation of data.

I would also add that most of the time a consumer needn't know which partition a message came from, in the same way that I don't eat a sandwich differently depending on which store it came from.

@mjuarez is actually not correct and I am not sure why his comment is being falsely confirmed by the OP. You can absolutely explicitly tell Kafka which partition a producer record pertains to using the following:

ProducerRecord(
        java.lang.String topic,
        java.lang.Integer partition, // <--------- !!!
        java.lang.Long timestamp,
        K key,
        V value)

https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord-java.lang.String-java.lang.Integer-java.lang.Long-KV-

So most of what was said after that becomes irrelevant.

Now to address the OP question directly: you want to accomplish a broadcast. To have a message sent once and read more than once you would have to have a different consumer group for each reader.

And that use case is an absolutely valid Kafka usage paradigm.

You can accomplish that using RabbitMQ too: https://www.rabbitmq.com/tutorials/tutorial-three-java.html ... but the way it is done is not ideal because multiple out-of-process queues are involved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM