简体繁体 English

卡夫卡中的不均匀分区/无钥匙

[英]Uneven partitioner in kafka / no key

原文 2022-04-04 09:42:54 8 1 java/ apache-kafka/ apache-kafka-streams/ spring-kafka

I have a topic with 3 partitions with only 1 consumer, and I am using the default partitioner which in this case is "Sticky".我有一个只有 1 个消费者的 3 个分区的主题，我使用的是默认分区程序，在本例中为“Sticky”。 everything else by default.其他一切都默认。

The data sent from the producer does not have a key and I don't want it to have one, I simply want each data to go to a random partition and for these to be evenly distributed.从生产者发送的数据没有密钥，我不希望它有密钥，我只是希望每个数据到 go 到一个随机分区，并使这些数据均匀分布。

However I have a result similar to this, where one partition is way above the others但是我得到的结果与此类似，其中一个分区远高于其他分区 穆埃斯特拉

As a result of this I have 2 questions.因此，我有 2 个问题。

Why did this happen?为什么会这样？
How can I make the partitions to be equal again?如何使分区再次相等？

I have tried to create a custom partitioner that looks at the size of each partition and assigns the data where it has less data.我试图创建一个自定义分区程序，它查看每个分区的大小并将数据分配到数据较少的地方。 is this possible?这可能吗？

1 个解决方案

Kafka documentation explains it: Kafka 文档对此进行了解释：

The DefaultPartitioner now uses a sticky partitioning strategy. DefaultPartitioner 现在使用粘性分区策略。 This means that records for specific topic with null keys and no assigned partition will be sent to the same partition until the batch is ready to be sent.这意味着具有 null 个键且未分配分区的特定主题的记录将被发送到同一分区，直到该批次准备好发送为止。 When a new batch is created, a new partition is chosen.创建新批次时，会选择一个新分区。 This decreases latency to produce, but it may result in uneven distribution of records across partitions in edge cases.这减少了生产延迟，但可能会导致边缘情况下跨分区的记录分布不均匀。 Generally users will not be impacted, but this difference may be noticeable in tests and other situations producing records for a very short amount of time.通常用户不会受到影响，但这种差异在测试和其他产生记录的时间很短的情况下可能会很明显。

Switching to the RoundRobinPartitionner (instead of DefaultPartitionner) is probably what you are looking for.切换到 RoundRobinPartitionner（而不是 DefaultPartitionner）可能正是您正在寻找的。 See https://kafka.apache.org/documentation/#producerconfigs_partitioner.class I ignore how constant your message rate, but under normal circumstances (Production) the Default partitionner is pretty fair.请参阅https://kafka.apache.org/documentation/#producerconfigs_partitioner.class我忽略了您的消息速率有多恒定，但在正常情况下（生产）默认分区器相当公平。

Also ensure that linger.ms is 0 and reduce batch.size as much as you can.还要确保linger.ms为0并尽可能减少batch.size 。

Implementing a custom Partitionner is rather easy.实现自定义 Partitionner 相当容易。 But knowing which partition is the smaller is harder as it will change very often.但是要知道哪个分区较小比较困难，因为它会经常更改。 You may end up spending more time refreshing partition sizes, and finding the smallest one that sending the message.您最终可能会花费更多时间刷新分区大小，并找到发送消息的最小分区。