简体   繁体   中英

Migrate Kafka Topic to new Cluster (and impact on Druid)

I am ingesting data into Druid from Kafka's topic. Now I want to migrate my Kafka Topic to the new Kafka Cluster. What are the possible ways to do this without duplication of data and without downtime?
I have considered below possible ways to migrate Topic to the new Kafka Cluster.

  1. Manual Migration:
    • Create a topic with the same configuration in the new Kafka cluster.
    • Stop pushing data in the Kafka cluster.
    • Start pushing data in the new cluster.
    • Stop consuming from the old cluster.
    • Start consuming from the new cluster.
  2. Produce data in both Kafka clusters:
    • Create a topic with the same configuration in the new Kafka cluster.
    • Start producing messages in both Kafka clusters.
    • Change Kafka topic configration in Druid.
    • Reset Kafka topic offset in Druid.
    • Start consuming from the new cluster.
    • After successful migration, stop producing in the old Kafka cluster.
  3. Use Mirror Maker 2:
    • MM2 creates Kafka's topic in a new cluster.
    • Start replicating data in both clusters.
    • Move producer and consumer to the new Kafka cluster.
    • The problem with this approach:
      1. Druid manages Kafka topic's offset in its metadata.
      2. MM2 will create two topics with the same name(with prefix) in the new cluster.
      3. Does druid support the topic name with regex?

Note: Druid manages Kafka topic offset in its metadata.
Druid Version: 0.22.1
Old Kafka Cluster Version: 2.0

Maybe a slight modification of your number 1:

  1. Start publishing to the new cluster.
  2. Wait for the current supervisor to catch up all the data in the old topic.
  3. Suspend the supervisor. This will force all the tasks to write and publish the segments. Wait for all the tasks for this supervisor to succeed. This is where "downtime" starts. All of the currently ingested data is still queryable while we switch to the new cluster. New data is being accumulated in the new cluster, but not being ingested in Druid.
  4. All the offset information of the current datasource are stored in Metadata Storage. Delete those records using

delete from druid_dataSource where datasource={name}

  1. Terminate the current supervisor.

  2. Submit the new spec with the new topic and new server information.

You can follow these steps:

1- On your new cluster, create your new topic (the same name or new name, doesn't matter)

2- Change your app config to send messages to new kafka cluster

3- Wait till druid consume all messages from the old kafka, you can ensure when data is being consumed by checking supervisor's lagging and offset info

4- Suspend the task, and wait for the tasks to publish their segment and exit successfully

5- Edit druid's datasource, make sure useEarliestOffset is set to true, change the info to consume from new kafka cluster (and new topic name if it isn't the same)

6- Save the schema and resume the task. Druid will hit the wall when checking the offset, because it cannot find them in new kafka, and then it starts from the beginning

Options 1 and 2 will have downtime and you will lose all data in the existing topic.

Option 2 cannot guarantee you wont lose data or generate duplicates as you try to send messages to multiple clusters together.

There will be no way to migrate the Druid/Kafka offset data to the new cluster without at least trying MM2. You say you can reset the offset in Option 2, so why not do the same with Option 3? I haven't used Druid, but it should be able to support consuming from multiple topics, with pattern or not. With option 3, you don't need to modify any producer code until you are satisfied with the migration process.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM