简体   繁体   中英

Copy messages from one Kafka topic to another using offsets/timestamps

For some data processing, we need to reprocess all the messages between 2 timestamps say between 1st Jan to 15th Jan.

to control upper bound we are planning to create a new topic that will have these messages so that once this task is complete, we can delete the topic too. The new topic will have data from a particular offsets of source topic

partition 1 - from offset 100
partition 2 - from offset 2400... and so on

What is the most suitable solution for this? approx 10lacs messages fall in this.

  1. Create a consumer from the source topic.
  2. Call .assign for the partitions you want to copy
  3. Call .seek for each starting offset of those partitions. You can use offsetsForTimes method to get them for a specific timestamp; then you can pass those on to the seek method.
  4. Create a Producer
  5. Start a poll loop (one thread per partition, ideally,each thread with the reference of the created producer).
  6. As polling, check the timestamp of the record
    • If record timestamp exceeds the date you're reading to, stop the poll loop / thread
    • Else, send that data via the producer to your output topic

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM