简体   繁体   English

Spark Streaming Kafka初始偏移

[英]Spark Streaming Kafka initial offset

I am using the Java Spark API, for the KafkaUtils.createDirectStream, I want to track the offset. 我正在对KafkaUtils.createDirectStream使用Java Spark API,我想跟踪偏移量。 There is a parameter called fromOffset, which records the offset in partitions of the Kafka topic. 有一个名为fromOffset的参数,该参数在Kafka主题的分区中记录偏移量。 for the first run, I have no idea of how many partitions I will have, then how can I set this parameter? 对于第一次运行,我不知道会有多少个分区,那么如何设置此参数? And will I need set "auto.offset.reset" in Kafka parameters? 我需要在Kafka参数中设置“ auto.offset.reset”吗? If yes, will it affect my code to recover from an known offset? 如果是,是否会影响我的代码从已知偏移量中恢复?

you have two options: 您有两种选择:

  • in case you don't have any information about partions, do not provide that param to createDirectStream. 如果您没有有关分区的任何信息,请不要将该参数提供给createDirectStream。 There are several implmentations of createDirectStream method. createDirectStream方法有多种实现。 In that case or earliest, or latest offset per each topicPartition will be used (based on the auto.offset.reset param) 在这种情况下,将使用每个topicPartition的最早或最新偏移量(基于auto.offset.reset参数)

  • you can find the partitions, offsets using usual kafka API. 您可以使用常用的kafka API查找分区和偏移量。 For example look How to find the offset range for a topic-partition in Kafka 0.10? 例如,看看如何在Kafka 0.10中找到主题分区的偏移范围?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM