简体   繁体   中英

KafkaJS - Find offset most efficient way

(Complete beginner when it comes to kafka and kafkajs so i'm sorry if this is silly question)

I have a problem where we have a topic that retains 48hrs of data (millions of records); And i'm wondering the best way to get from this topic the last "20 minutes" of data and then also stream new messages.

Each message in this topic is JSON and has a timestamp in UNIX milliseconds since epoch (UTC).

Performance is obviously an issue here

There is a facility in the Java client to seek to offsets by timestamp . There is a PR in KafkaJS for this, it doesn't seem to be verified and merged though.

I suppose node-rdkafka has it. An example is below ( reference )

consumer.offsetsForTimes(
    [ {topic: 'hi', partition: 0, offset: Date.now() - (20*60*1000) } ],
    timeout,
    console.log
);

When you get the offsets, you can seek to them and start reading.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM