简体   繁体   English

如何减少 Kafka 主题的分区数?

[英]How to decrease number partitions Kafka topic?

I created a topic with 4 partitions on Kafka.我在 Kafka 上创建了一个有 4 个分区的主题。 (set default number.partition=4 ). (设置默认number.partition=4 )。 Now I want to change number partition of this topic to 3. I've tried running现在我想将这个主题的数字分区更改为 3。我试过运行

./bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic my-topic --partitions 3

but there is no change.但没有变化。 It still has 4 partitions.它仍然有 4 个分区。 Anyone know about this?有人知道吗?

Apache Kafka doesn't support decreasing the partition number. Apache Kafka 不支持减少分区数。 You should see the topic as a whole and the partitions are a way for scaling out improving performance.您应该将整个主题视为一个整体,分区是一种扩展以提高性能的方式。 So all data sent to topic flow to all partitions and removing one of them means data loss.因此,发送到主题的所有数据都会流向所有分区,删除其中一个分区意味着数据丢失。

You can't just delete a partition because that would lead to data loss and also the remaining data's keys would not be distributed correctly so new messages would not get directed to the same partitions as old existing messages with the same key.您不能仅仅删除一个分区,因为这会导致数据丢失,并且剩余数据的键也不会正确分配,因此新消息不会被定向到与具有相同键的旧现有消息相同的分区。

For the above reasons Kafka does not support decreasing partition counts on an existing topic.由于上述原因,Kafka 不支持减少现有主题的分区数。

What you can do is to create a new topic with 3 partitions and then write an small program (or use an existing replication tool) to copy the data from the old 4 partition topic to the new 3 partition topic.你可以做的就是新建一个3分区的topic,然后写一个小程序(或者使用现有的复制工具)将数据从旧的4分区topic复制到新的3分区topic。 That way you will be running everything through the same partitioner and all your keyed messages will end up in the right partition.这样,您将通过同一个分区器运行所有内容,并且所有带键的消息都将在正确的分区中结束。 Once you are satisfied the data is all copied then delete the original 4 partition topic.一旦您满意数据全部复制,然后删除原来的 4 个分区主题。

If you must retain the same topic name as the original topic then just create a new topic with the original name, copy the data back from the repartitioned topic, and delete that temporary repartitioning topic.如果您必须保留与原始主题相同的主题名称,则只需使用原始名称创建一个新主题,从重新分区的主题中复制数据,然后删除该临时重新分区的主题。

I don't buy all the above answers.我不买以上所有答案。 "Remove a partition causes data loss" is a vague answer. “删除分区导致数据丢失”是一个模糊的答案。 Decreasing partition numbers is not a new thing in the distributed system and in fact many systems support it.减少分区数在分布式系统中并不是什么新鲜事,事实上很多系统都支持它。 If you can afford the overhead of rebalancing the entire storage system while keeping the consistency of the data, decreasing partition is not an impossible thing to do.如果你能负担得起在保持数据一致性的同时重新平衡整个存储系统的开销,那么减少分区并不是不可能的事情。

In my opinion, the true reason Kafka doesn't support decreasing the partition number is due to an important property of Kafka: Kafka guarantees the order of the message within each partition but the order of the message between the partition is not guaranteed (but it's possible).在我看来,Kafka 不支持减少分区数的真正原因是由于 Kafka 的一个重要属性:Kafka 保证每个分区内消息的顺序,但不保证分区之间消息的顺序(但它是可能)。 This ordering property is crucial in many use cases.此排序属性在许多用例中至关重要。 In the cause of removing one of the partitions, redistributing messages in the removed partition to other partitions while preserving the order is impossible because ordering between partitions is not guaranteed.在删除其中一个分区的原因中,在保留顺序的同时将被删除分区中的消息重新分发到其他分区是不可能的,因为无法保证分区之间的排序。 No matter how you distribute the data in the removed partition, you will break the order guarantee properties of any partition you distribute into.无论您如何分布已删除分区中的数据,您都将破坏您分布到的任何分区的顺序保证属性。 If Kafka doesn't care about the order of messages within each partition, decreasing the partition number can easily be supported.如果 Kafka 不关心每个分区内的消息顺序,那么可以很容易地支持减少分区号。

不支持减少分区号。

You can use create standalone java program to achieve the same , ie increase and decrease the partition and replication using AdminUtils.您可以使用创建独立的 java 程序来实现相同的目的,即使用 AdminUtils 增加和减少分区和复制。

import org.I0Itec.zkclient.ZkClient;导入 org.I0Itec.zkclient.ZkClient;

import kafka.admin.AdminUtils;导入 kafka.admin.AdminUtils;

import kafka.utils.ZKStringSerializer$;导入 kafka.utils.ZKStringSerializer$;

import kafka.utils.ZkUtils;导入 kafka.utils.ZkUtils;

import scala.collection.Seq;进口 scala.collection.Seq;

import scala.collection.Map;导入 scala.collection.Map;

public PartitionCreator(String zkhost, String topicName, int partitions, int replifactor) {
    ZkClient zkClient = new ZkClient(zkhost, 30000, 30000, ZKStringSerializer$.MODULE$);
    zkUtils = ZkUtils.apply(zkClient, false);

    this.topicName = topicName;
    this.partitions = partitions;
    this.replifactor = replifactor;
}

public void createPartion() {

    AdminUtils.createTopic(zkUtils, topicName, partitions, replifactor, new Properties());
    System.out.println("created/updated topic..");
}

Note: createTopic() internally updates the topic if topic not available.注意:如果主题不可用,createTopic() 会在内部更新主题。

Apache Kafka provides us with alter command to change Topic behavior and add/modify configurations. Apache Kafka 为我们提供了alter 命令来改变 Topic 行为和添加/修改配置。 We will be using alter command to add more partitions to an existing Topic.我们将使用 alter 命令向现有主题添加更多分区。 Note: While Kafka allows us to add more partitions, it is NOT possible to decrease the number of partitions of a Topic.注意:虽然 Kafka 允许我们添加更多分区,但不可能减少主题的分区数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM