简体   繁体   English

Kafka阅读该主题的所有消息

[英]Kafka reading all the messages of the topic

I would like to read all the messages from a Kafka topic in a scheduled interval to calculate some global index value. 我想按计划的时间间隔从Kafka主题中读取所有消息,以计算一些全局索引值。 I am doing something like this: 我正在做这样的事情:

props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
  props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
  props.put("group.id", "test")
  props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
  props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG,Int.MaxValue.toString)

  val consumer = new KafkaConsumer[String, String](props)
  consumer.subscribe(util.Collections.singletonList(TOPIC))
  consumer.poll(10000)
  consumer.seekToBeginning(consumer.assignment())
   val records = consumer.poll(10000)

with this mechanism, I get all the records but is this an efficient way of doing it? 通过这种机制,我可以获得所有记录,但这是一种有效的方法吗? It will be around 20000000(2.1 GB) records per topic. 每个主题大约有20000000(2.1 GB)个记录。

You might probably consider Kafka Streams library to do this. 您可能会考虑使用Kafka Streams库来执行此操作。 It supports differrent type of windows. 它支持不同类型的窗口。

  1. Tumbling time window 翻滚时间窗
  2. Hopping time window 跳跃时间窗口
  3. Sliding time window 滑动时间窗
  4. Session window 会话窗口

You can use Tumbling windows to capture the events in the given internal and calculate your global index. 您可以使用Tumbling窗口捕获给定内部事件并计算全局索引。

https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#windowing https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#windowing

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM