简体   繁体   English

Apache Kafka消息广播

[英]Apache Kafka Message broadcasting

I am studying Apache-kafka and have some confusion. 我正在学习Apache-kafka,有些困惑。 Please help me to understand the following scenario. 请帮助我了解以下情况。

I have a topic with 5 partitions and 5 brokers in a Kafka cluster. 我有一个在Kafka集群中具有5个分区和5个代理的主题。 I am maintaining my message order in Partition 1(say P1).I want to broadcast the messages of P1 to 10 consumers. 我正在分区1(例如P1)中维护消息顺序。我想向10个消费者广播P1消息。

So my question is; 所以我的问题是; how do these 10 consumers interact with topic partition p1. 这10个使用者如何与主题分区p1交互。

This is probably not how you want to use Kafka. 这可能不是您要使用Kafka的方式。

Unless you're being explicit with how you set your keys, you can't really control which partition your messages end up in when producing to a topic. 除非您明确设置键的方式,否则您将无法真正控制生成主题时消息最终到达哪个分区。 Partitions in Kafka are designed to be more like low-level plumbing, something that exists, but you don't usually have to interact with. Kafka中的分区被设计为更像是低级管道,这种管道已经存在,但是您通常不必与之交互。 On the consumer side, you will be assigned partitions based on how many consumers are active for a particular consumer group at any one time. 在使用者方面,将根据任意时刻某个特定使用者组活动的使用者数量为您分配分区。

One way to get around this is to define a topic to have only a single partition, in which case, of course, all messages will go to that partition. 解决该问题的一种方法是将一个主题定义为仅具有一个分区,在这种情况下,所有消息当然都将进入该分区。 This is not ideal, since Kafka won't be able to parallelize data ingestion or serving, but it is possible. 这是不理想的,因为Kafka无法并行化数据摄取或服务,但有可能。

So, having said that, let's assume that you did manage to put all your messages in partition 1 of a specific topic. 因此,话虽如此,让我们假设您确实设法将所有消息放入特定主题的分区1中。 When you fire up a consumer of that topic with consumer group id of consumer1 , it will be assigned all the partitions for that topic, since that consumer is the only active one for that particular group id. 当您使用消费者组id consumer1激发该主题的consumer1 ,将为该主题分配所有分区,因为该消费者是该特定组ID的唯一活动分区。 If there is only one partition for that topic, like explained above, then that consumer will get all the data. 如果只有一个分区用于该主题(如上文所述),那么该使用者将获得所有数据。 If you then fire up a second consumer with the same group id, Kafka will notice there's a second consumer for that specific group id, but since there's only one partition, it can't assign any partitions to it, so that consumer will never get any data. 然后,如果您启动具有相同组ID的第二个使用者,Kafka将注意到该特定组ID的第二个使用者,但是由于只有一个分区,它无法为其分配任何分区,因此,该使用者将永远无法获得任何数据。

On the other hand, if you fire up a third consumer with a different consumer group id, say consumer2 , that consumer will now get all the data, and it won't interfere at all with consumer1 message consumption, since Kafka keeps track of their consuming offsets separately. 另一方面,如果您用另一个消费者组ID激活另一个消费者,例如consumer2那么该消费者现在将获取所有数据,并且不会干扰consumer1消息消费,因为卡夫卡会跟踪他们的消息分别消耗抵消。 Kafka keeps track of which offset each particular ConsumerGroupId is at on each partition, so it won't get confused if one of them starts consuming slowly or stops for a while and restarts consuming later that day. Kafka会跟踪每个分区上每个特定ConsumerGroupId的偏移量,因此,如果其中一个开始缓慢消耗或停止一段时间并在当天晚些时候重新开始消耗,则不会感到困惑。

Much more detailed information here on how Kafka works here: https://kafka.apache.org/documentation/#gettingStarted 有关Kafka如何工作的更多详细信息,请访问: https : //kafka.apache.org/documentation/#gettingStarted

And more information on how to use the Kafka consumer at this link: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html 有关如何使用Kafka使用者的更多信息,请访问以下链接: https : //kafka.apache.org/20/javadoc/index.html? org/apache/kafka/clients/consumer/ KafkaConsumer.html

@mjuarez's answer is absolutely correct - just for brevity I would reduce it to the following; @mjuarez的答案是绝对正确的-为了简洁起见,我将其简化为以下内容;

Don't try and read only from a single partition because it's a low level construct and it somewhat undermines the parallelism of Kafka. 不要尝试仅从单个分区读取数据,因为它是底层结构,并且在某种程度上破坏了Kafka的并行性。 You're much better off just creating more topics if you need finer separation of data. 如果您需要更好的数据分离,那么最好只创建更多的主题。

I would also add that most of the time a consumer needn't know which partition a message came from, in the same way that I don't eat a sandwich differently depending on which store it came from. 我还要补充一点,大多数情况下,消费者不需要知道消息来自哪个分区,就像我不会根据三明治来自哪个商店一样吃三明治。

@mjuarez is actually not correct and I am not sure why his comment is being falsely confirmed by the OP. @mjuarez实际上是不正确的,我不确定为什么OP会错误地确认他的评论。 You can absolutely explicitly tell Kafka which partition a producer record pertains to using the following: 您可以使用以下命令绝对明确地告诉Kafka生产者记录属于哪个分区:

ProducerRecord(
        java.lang.String topic,
        java.lang.Integer partition, // <--------- !!!
        java.lang.Long timestamp,
        K key,
        V value)

https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord-java.lang.String-java.lang.Integer-java.lang.Long-KV- https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html#ProducerRecord-java.lang.String-java.lang.Integer-java.lang.Long-KV-

So most of what was said after that becomes irrelevant. 因此,此后所说的大部分内容都变得无关紧要。

Now to address the OP question directly: you want to accomplish a broadcast. 现在直接解决OP问题:您想完成广播。 To have a message sent once and read more than once you would have to have a different consumer group for each reader. 要发送一次消息并阅读一次以上,则每个阅读者都必须有一个不同的消费者组。

And that use case is an absolutely valid Kafka usage paradigm. 该用例是绝对有效的Kafka使用范例。

You can accomplish that using RabbitMQ too: https://www.rabbitmq.com/tutorials/tutorial-three-java.html ... but the way it is done is not ideal because multiple out-of-process queues are involved. 您也可以使用RabbitMQ来完成该任务: https : //www.rabbitmq.com/tutorials/tutorial-three-java.html ...但是,由于涉及多个进程外队列,所以这样做的方法也不理想。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM