简体   繁体   中英

Kafka | Increase replication factor of multiple topics

I have a 3 broker Kafka cluster with many topics with replication factor 1. I know I can increase it by passing JSON file with the partition reassignment configuration to kafka-reassign-partitions.sh .

My confusion is should I pass a single JSON file with partition reassignment details of all topics or should I create a JSON for each topic and run them individually?

You can either create multiple .json files or use a single file that contains reassignment details for more than one topic:

{
  "version":1,
  "partitions":[
      {"topic":"topic_1","partition":0,"replicas":[0,1]},
      {"topic":"topic_1","partition":1,"replicas":[1,0]}, 
      {"topic":"topic_2","partition":0,"replicas":[0,1]},
      {"topic":"topic_2","partition":1,"replicas":[1,0]}
  ]
}

And then run

./bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file increase-replication-factor.json --execute

Your topics should look like below:

./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic topic_1
Topic:demo-topic        PartitionCount:2        ReplicationFactor:2     Configs:
        Topic: topic_1       Partition: 0    Leader: 0       Replicas: 0,1     Isr: 0,1
        Topic: topic_1       Partition: 1    Leader: 1       Replicas: 1,0     Isr: 1,0

./bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic topic_2
Topic:demo-topic        PartitionCount:2        ReplicationFactor:2     Configs:
        Topic: topic_2       Partition: 0    Leader: 0       Replicas: 0,1     Isr: 0,1
        Topic: topic_2       Partition: 1    Leader: 1       Replicas: 1,0     Isr: 1,0

Finally, Finally, the --verify option can be used with the tool to check the status of the partition reassignment. Note that the same expand-cluster-reassignment.json (used with the --execute option) should be used with the --verify option

> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file increase-replication-factor.json  --verify
Status of partition reassignment:
Reassignment of partition [topic_1,0] completed successfully
Reassignment of partition [topic_1,1] is in progress
Reassignment of partition [topic_2,0] completed successfully
Reassignment of partition [topic_2,1] completed successfully 

This a balance of cost / risk.

  1. Reassigning all topics together:

    • Pros: easy to run, single command. Single task to monitor
    • Cons: Not a lot of control. Depending on your cluster, a lot of data could be copied by the process. While you can set reassignment quotas, it can be hard to precisely control the bandwidth used by the reassignment. Hence this can affect other services using the cluster
  2. Reassigning topics in "small" chunks:

    • Pros: This allows more control over the impact a large reassignment can have
    • Cons: Operators have to split the reassignment. Run and monitor each chunk

Depending on the size and usage of your cluster, you should be able to identify which method is the best for you. In a busy cluster, I'd recommend setting reasignment quotas and only reassigning topics by chunks as otherwise reassignment will try to execute as fast as possible and this can impact the cluster greatly. If your cluster is mostly fresh/unused then you may be able to reassign all topics at the same time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM