简体   繁体   English

Apache Kafka 分区未使用 RoundRobin 分区程序均匀分布

[英]Apache Kafka Partitions not evenly Distributed with RoundRobin Partitioner

I am using Kafka Producer with the RoundRobin partitioner to a topic which has 12 partitions.我将 Kafka Producer 与 RoundRobin 分区程序一起使用到具有 12 个分区的主题。

Code can be found here https://github.com/apache/kafka/blob/2.8/clients/src/main/java/org/apache/kafka/clients/producer/RoundRobinPartitioner.java代码可以在这里找到https://github.com/apache/kafka/blob/2.8/clients/src/main/java/org/apache/kafka/clients/producer/RoundRobinPartitioner.java

The issue which i am facing is that this partitioner is giving the partition to send a particular message correctly(in a round robin way) but in the kafka producer code the partition method is getting called twice at line no 931 and 956(inside a if block for new batch) due to which certain partitions have no records sent to them and i cannot achieve the parallelism of 12 which i would like.我面临的问题是这个分区程序让分区正确发送特定消息(以循环方式)但是在 kafka 生产者代码中,分区方法在第 931 行和第 956 行被调用两次(在 if 内新批次的块)由于某些分区没有发送给它们的记录,我无法达到我想要的 12 的并行度。 I have tried the following thing.我试过以下的东西。 I have written a custom partitioner effectively with the same logic as roundrobin partitioner, only difference being if the partition method is called after newBatch method is invoked on the partitioner then the previously returned partition no is returned.我已经使用与循环分区程序相同的逻辑有效地编写了一个自定义分区程序,唯一的区别是如果在分区程序上调用 newBatch 方法之后调用分区方法,则返回先前返回的分区号。 I am kind of nervous of using this in production without understanding why was the kafka producer code the way it is as specified above and if someone can throw some light on it, i would really appreciate it.在不理解为什么 kafka 生产者代码按照上面指定的方式使用它的情况下,我有点紧张,如果有人可以对它有所了解,我将非常感激。 Also if anyone has any suggestions through which I can ensure that records get evenly distributed on every partition, i am open to them too.此外,如果有人有任何建议,我可以通过这些建议确保记录在每个分区上均匀分布,我也愿意接受他们。

The Kafka Producer Code - https://github.com/apache/kafka/blob/2.8/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java Kafka 生产者代码 - https://github.com/apache/kafka/blob/2.8/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

Thank you in advance.先感谢您。

Solution解决方案

You should use DefaultPartitioner rather than RoundRobinPartitioner .您应该使用DefaultPartitioner而不是RoundRobinPartitioner

Comments on DefaultPartitioner.java says: DefaultPartitioner.java 的评论说:

The default partitioning strategy:默认分区策略:

  • If a partition is specified in the record, use it如果记录中指定了分区,则使用它
  • If no partition is specified but a key is present choose a partition based on a hash of the key如果未指定分区但存在密钥,请根据密钥的 hash 选择分区
  • If no partition or key is present choose the sticky partition that changes when the batch is full.如果不存在分区或密钥,请选择在批处理已满时更改的粘性分区。

    See KIP-480 for details about sticky partitioning.有关粘性分区的详细信息,请参阅 KIP-480。

  • Provide no specified partition number or partition key for your producer record, then the sticky partition will work, which approximatively make records on your Kafka partitions even.不为你的producer record提供指定的partition number或partition key,然后sticky partition会起作用,这大约使你的Kafka partitions上的记录均匀。 see Tip #2: Learn about the new sticky partitioner in the producer API请参阅提示 #2:了解生产者中新的粘性分区器 API

    Internal内部的

    Again, I want to explain why RoundRobinPartitioner always not working in a generally supposed round robin way.同样,我想解释一下为什么 RoundRobinPartitioner 总是不以通常认为的循环方式工作。 "partition()" in RoundRobinPartitioner can only make sure that the number distribution of calling partition() method, is in a round robin way, which is not enough to ensure our records on partitions are even. RoundRobinPartitioner中的“partition()”只能保证调用partition()方法的次数是循环分配的,不足以保证我们分区上的记录是均匀的。

    Pay attention to the fact that invocation of partition() in KafkaProducer.doSend() is strange(possible two consecutive invocation of partition()).注意KafkaProducer.doSend()partition()的调用比较奇怪(可能连续调用了两次partition())。

    Subtle codes here may cause Unequal distribution of partitions when partitions number are even.当分区数为偶数时,此处的细微代码可能会导致分区分配不均。 Say we have 4 partitions(0,1,2,3), and 8 records.假设我们有 4 个分区 (0,1,2,3) 和 8 条记录。

    record 1 -> 2 **partition()** call(return 0, return 1), finally assigned to partition 1
    record 2 -> 2 **partition()** call(return 2, return 3), finally assigned to partition 3
    record 3 -> 2 **partition()** call(return 0, return 1), finally assigned to partition 1
    record 4 -> 2 **partition()** call(return 2, return 3), finally assigned to partition 3
    record 5 -> 2 **partition()** call(return 0, return 1), finally assigned to partition 1
    ....
    

    See?看? records will only be distributed to partition 1 and 3 !记录只会分配到分区 1 和 3!

    Conclusion结论

    RoundRobinPartitioner has one confusing name, it offers RoundRobin of calling partition() , rather than RoundRobin of KafkaProducer.send() . RoundRobinPartitioner 有一个容易混淆的名字,它提供了调用partition()的 RoundRobin,而不是调用KafkaProducer.send()的 RoundRobin。 To make sure records equally distributed in all partitions, use DefaultPartitioner please!为了确保记录在所有分区中均匀分布,请使用DefaultPartitioner

    声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

     
    粤ICP备18138465号  © 2020-2024 STACKOOM.COM