简体   繁体   中英

Spark Streaming Kafka initial offset

I am using the Java Spark API, for the KafkaUtils.createDirectStream, I want to track the offset. There is a parameter called fromOffset, which records the offset in partitions of the Kafka topic. for the first run, I have no idea of how many partitions I will have, then how can I set this parameter? And will I need set "auto.offset.reset" in Kafka parameters? If yes, will it affect my code to recover from an known offset?

you have two options:

  • in case you don't have any information about partions, do not provide that param to createDirectStream. There are several implmentations of createDirectStream method. In that case or earliest, or latest offset per each topicPartition will be used (based on the auto.offset.reset param)

  • you can find the partitions, offsets using usual kafka API. For example look How to find the offset range for a topic-partition in Kafka 0.10?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM