简体   繁体   English

使用偏移量/时间戳将消息从一个 Kafka 主题复制到另一个主题

[英]Copy messages from one Kafka topic to another using offsets/timestamps

For some data processing, we need to reprocess all the messages between 2 timestamps say between 1st Jan to 15th Jan.对于某些数据处理,我们需要重新处理 2 个时间戳之间的所有消息,例如 1 月 1 日到 1 月 15 日之间。

to control upper bound we are planning to create a new topic that will have these messages so that once this task is complete, we can delete the topic too.为了控制上限,我们计划创建一个包含这些消息的新主题,以便在此任务完成后,我们也可以删除该主题。 The new topic will have data from a particular offsets of source topic新主题将具有来自源主题的特定偏移量的数据

partition 1 - from offset 100分区 1 - 从偏移量 100
partition 2 - from offset 2400... and so on分区 2 - 从偏移量 2400... 等等

What is the most suitable solution for this?什么是最合适的解决方案? approx 10lacs messages fall in this.大约有 10lacs 消息落入其中。

  1. Create a consumer from the source topic.从源主题创建消费者。
  2. Call .assign for the partitions you want to copy为要复制的分区调用.assign
  3. Call .seek for each starting offset of those partitions.为这些分区的每个起始偏移量调用.seek You can use offsetsForTimes method to get them for a specific timestamp;您可以使用offsetsForTimes方法为特定时间戳获取它们; then you can pass those on to the seek method.然后你可以将它们传递给 seek 方法。
  4. Create a Producer创建生产者
  5. Start a poll loop (one thread per partition, ideally,each thread with the reference of the created producer).启动轮询循环(每个分区一个线程,理想情况下,每个线程都引用创建的生产者)。
  6. As polling, check the timestamp of the record作为轮询,检查记录的时间戳
    • If record timestamp exceeds the date you're reading to, stop the poll loop / thread如果记录时间戳超过您正在阅读的日期,请停止轮询循环/线程
    • Else, send that data via the producer to your output topic否则,通过生产者将该数据发送到您的 output 主题

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将消息从一个 Kafka 主题复制到另一个 Kafka 主题 - Replicating messages from one Kafka topic to another kafka topic 如何将所有消息从一个 kafka 主题(avro 格式)复制和转换到另一个主题(json 格式) - How to copy and transform all messages from one kafka topic (in avro format) to another topic (in json format) 将消息从一个 Kafka 主题“推送”到另一个主题 - “Pushing” Messages From One Kafka Topic To Another 从一系列偏移量之间读取来自 Kafka 主题的消息 - Read messages from Kafka topic between a range of offsets 从 kafka 主题 __consumer_offsets 中删除特定消息 - delete specific messages from kafka topic __consumer_offsets 使用 checkpointLocation 偏移量从 Kafka 主题读取流的正确方法 - Right way to read stream from Kafka topic using checkpointLocation offsets Kafka从主题删除记录,而不使用偏移量,而是通过记录的字段 - Kafka delete records from the topic without using offsets but by a field of the record 有什么方法可以将Kafka消息从一台服务器上的主题转发到另一台服务器上的主题? - Is there any way to forward Kafka messages from topic on one server to topic on another server? 如何将主题从 kafka 集群复制到另一个 kafka 集群? - How to copy a topic from a kafka cluster to another kafka cluster? 使用Spark Streaming Kafka无法从Kafka主题读取消息 - Unable To Read Messages From Kafka Topic Using Spark Streaming Kafka
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM