简体   繁体   English

重新平衡Kafka中某个主题的分区的成本

[英]Cost of Rebalancing partitions of a topic in Kafka

I am trying to come up with a design for consuming from Kafka. 我正在尝试从Kafka提出一种消费设计。 I am using 0.8.1.1 version of Kafka. 我正在使用0.8.1.1版本的Kafka。 I am thinking of designing a system where the consumer will be created every few seconds, consume the data from Kafka, process it and then quits after committing the offsets to Kafka. 我正在考虑设计一个系统,在该系统中,将每隔几秒钟创建一个使用者,使用卡夫卡中的数据,对其进行处理,然后将偏移量提交给卡夫卡后退出。 At any point of time expect 250 - 300 consumers to be active (running as ThreadPools in different machines). 在任何时间点,期望有250-300个使用者处于活动状态(在不同计算机上作为ThreadPools运行)。

  1. How and When a rebalance of partitions happens? 分区的重新平衡如何以及何时发生?

  2. How costly is the rebalancing of partitions among the consumers. 重新平衡消费者之间的分区的成本有多高。 I am expecting a new consumer finishing up or joining every few seconds to the same consumer group. 我希望有一个新的消费者能够完成工作,或者每隔几秒钟就会加入同一个消费者群体。 So I just want to know the overhead and latency of a rebalancing operation. 因此,我只想知道重新平衡操作的开销和延迟。

  3. Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is processing a message M1 from Partition P1. 假设使用者C1分配有分区P1,P2,P3,并且它正在处理来自分区P1的消息M1。 Now Consumer C2 joins the group. 现在,消费者C2加入了该组。 How is the partitions divided between C1 and C2. 分区如何在C1和C2之间划分。 Is there a possibility where C1's (which might take some time to commit its message to Kafka) commit for M1 will be rejected and M1 will be treated as a fresh message and will be delivered to someone else (I know Kafka is at least once delivery model but wanted to confirm if the re partition by any chance cause a re delivery of the same message)? 是否有可能C1的提交(可能需要一些时间将其消息提交给Kafka)提交给M1会被拒绝,而M1将被视为新消息并传递给其他人(我知道Kafka至少传递了一次)模型,但想确认重新分区是否有可能导致重新传递相同的消息)?

I'd rethink the design if I were you. 如果我是你,我会重新考虑设计。 Perhaps you need a consumer pool? 也许您需要一个消费者群体?

  1. Rebalancing happens every time a consumer joins or leaves the group. 每当消费者加入或离开小组时,就会发生重新平衡。

  2. Kafka and the current consumer were definitely designed for long running consumers. Kafka和当前的消费者肯定是为长期运行的消费者设计的。 The new consumer design (planned for 0.9) will handle short-lived consumers better. 新的消费者设计(计划为0.9)将更好地处理短命的消费者。 Re-balances takes 100-500ms in my experience, depending a lot on how ZooKeeper is doing. 根据我的经验,重新平衡需要100-500毫秒,这在很大程度上取决于ZooKeeper的工作方式。

  3. Yes, duplicates happen often during rebalancing. 是的,在重新平衡期间经常发生重复。 Thats why we try to avoid them. 这就是为什么我们试图避免它们。 You can try to work around that by committing offsets more frequently, but with 300 consumers committing frequently and a lot of consumers joining and leaving - your Zookeeper may become a bottleneck. 您可以尝试通过更频繁地提交偏移来解决该问题,但是由于300个消费者频繁提交偏移量,并且许多消费者加入和离开-Zookeeper可能成为瓶颈。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM