简体   繁体   English

如果消费者比分区更多,kafka消费者如何工作

[英]How kafka consumer works if consumers are more that partitions

Could anyone please explain and direct me link or resource to read about how the kafka consumers works in below scenarios. 任何人都可以解释一下并指导我链接或资源,以了解kafka用户在以下情况下的工作方式。

  1. One consumer group with 5 consumers and topic with 3 partitions (how kafka decides ) 一个拥有5个消费者的消费群体,以及具有3个分区的主题(kafka如何决定)

  2. One Consumer group with 5 consumers and topic with 10 partitions ( how kafka share load) 一个拥有5个消费者的消费者组,以及一个具有10个分区的主题(kafka如何共享负载)

  3. Two consumer group with 1 consumer each and kafka cluster of 2 servers where one topic is partitioned between node 1 and node 2 , how duplications can be avoided when consumers from different groups subscribed to one partition. 两个使用者组,每个使用者1个使用者,以及2个服务器的kafka群集,其中一个主题在节点1和节点2之间进行分区,当来自不同组的使用者订阅一个分区时,如何避免重复。

The above may not a best practice when configuring kafka , but i need to know how it handled. 以上在配置kafka时可能不是最佳实践,但我需要知道它的处理方式。

Thanks in Advance. 提前致谢。

It's not Kafka itself to assign partitions, but one of the consumers. 分配分区不是Kafka本身,而是使用者之一。 The first one joining a consumer group will be elected as sort of "leader" and we'll start assigning partitions to the other consumers. 加入消费者组的第一个将被选为“领导者”,我们将开始为其他消费者分配分区。 Of course, every time a new consumer joins the group, the Kafka "controller" let the leader consumer to know about that and it starts the rebalancing re-assigning partitions. 当然,每次有新的使用者加入该组时,Kafka的“控制器”就会使主要使用者知道这一点,并开始重新分配重新分配的分区。 It's the same when a consumer leaves a consumer group. 消费者离开消费者群体时,情况相同。

To confirm that the consumer is involved on that, the strategy for partition assignment is specified by the partition.assignment.strategy property in a consumer configuration. 为了确认涉及到使用者,分区分配的策略由使用者配置中的partition.assignment.strategy属性指定。 The default value is RangeAssignor while the other ones are RoundRobinAssignor and StickyAssignor . 默认值为RangeAssignor ,其他值为RoundRobinAssignorStickyAssignor You can find more about how they work here: 您可以在此处找到有关其工作原理的更多信息:

https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/RangeAssignor.html https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/RoundRobinAssignor.html https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/StickyAssignor.html https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/RangeAssignor.html https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer /RoundRobinAssignor.html https://kafka.apache.org/21/javadoc/org/apache/kafka/clients/consumer/StickyAssignor.html

Said that, what happens specifically in your scenarios? 话虽如此,在您的情况下会发生什么?

  1. 3 consumers will get one partition each. 3个消费者将分别获得一个分区。 The other 2 will be idle. 另外两个将处于空闲状态。
  2. each consumer will get 2 partitions 每个消费者将获得2个分区
  3. Using different consumer groups mean pure pub/sub where the consumer groups get same messages. 使用不同的消费群体意味着纯的发布/订阅,其中消费群体获得相同的消息。 In your case with 1 topic and 2 partitions (on 2 brokers), the two consumers each in one different consumer group, will get the same messages from all 2 partitions. 在您有1个主题和2个分区(在2个代理上)的情况下,两个使用者分别位于一个不同的使用者组中,将从所有2个分区中获得相同的消息。 If consumers are part of different consumer groups you cannot avoid duplication, it's how Kafka works. 如果消费者是不同消费者群体的一部分,那么您就无法避免重复,这就是Kafka的运作方式。

It depends on partition.assignment.strategy property, which is set to the class org.apache.kafka.clients.consumer.RangeAssignor bu default. 它取决于partition.assignment.strategy属性,该属性默认设置为org.apache.kafka.clients.consumer.RangeAssignor类。 From the java doc: 从Java文档中:

The range assignor works on a per-topic basis. 范围分配器基于每个主题工作。 For each topic, we lay out the available partitions in numeric order and the consumers in lexicographic order. 对于每个主题,我们以数字顺序排列可用分区,并以字典顺序排列使用者。 We then divide the number of partitions by the total number of consumers to determine the number of partitions to assign to each consumer. 然后,我们将分区数除以使用者总数,以确定分配给每个使用者的分区数。 If it does not evenly divide, then the first few consumers will have one extra partition. 如果它没有均匀划分,那么前几个消费者将有一个额外的划分。 For example, suppose there are two consumers C0 and C1, two topics t0 and t1, and each topic has 3 partitions, resulting in partitions t0p0, t0p1, t0p2, t1p0, t1p1, and t1p2. 例如,假设有两个使用者C0和C1,两个主题t0和t1,并且每个主题都有3个分区,从而得出分区t0p0,t0p1,t0p2,t1p0,t1p1和t1p2。 The assignment will be: C0: [t0p0, t0p1, t1p0, t1p1] C1: [t0p2, t1p2] 分配为:C0:[t0p0,t0p1,t1p0,t1p1] C1:[t0p2,t1p2]

You can provide your own strategy by implementing org.apache.kafka.clients.consumer.internals.PartitionAssignor . 您可以通过实现org.apache.kafka.clients.consumer.internals.PartitionAssignor提供自己的策略。 There is a good article on Medium about it. 关于Medium,有一篇很好的文章

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM