[英]KafkaJS - Find offset most efficient way
(Complete beginner when it comes to kafka and kafkajs so i'm sorry if this is silly question) (当涉及到 kafka 和 kafkajs 时,完全是初学者,所以如果这是一个愚蠢的问题,我很抱歉)
I have a problem where we have a topic that retains 48hrs of data (millions of records);我有一个问题,我们有一个保留 48 小时数据(数百万条记录)的主题; And i'm wondering the best way to get from this topic the last "20 minutes" of data and then also stream new messages.我想知道从这个主题中获取最后“20 分钟”数据以及 stream 新消息的最佳方法。
Each message in this topic is JSON and has a timestamp in UNIX milliseconds since epoch (UTC).此主题中的每条消息都是 JSON 并且具有自纪元 (UTC) 以来的 UNIX 毫秒的时间戳。
Performance is obviously an issue here性能显然是这里的一个问题
There is a facility in the Java client to seek to offsets by timestamp . Java 客户端中有一个工具可以通过时间戳来寻找偏移量。 There is a PR in KafkaJS for this, it doesn't seem to be verified and merged though.为此,KafkaJS 中有一个PR ,但似乎没有经过验证和合并。
I suppose node-rdkafka
has it.我想node-rdkafka
有它。 An example is below ( reference )下面是一个例子( 参考)
consumer.offsetsForTimes(
[ {topic: 'hi', partition: 0, offset: Date.now() - (20*60*1000) } ],
timeout,
console.log
);
When you get the offsets, you can seek to them and start reading.当你得到偏移量时,你可以寻找它们并开始阅读。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.