简体   繁体   中英

Read from kafka in a Spark batch job (fromOffset untilOffset)

I saw on this question that we can read messages from Kafka in spark batch jobs using org.apache.spark.streaming.kafka.KafkaUtils#createRDD but this method requires a offset range that needs a 'from offset' and 'until offset'. I'm getting the 'from offset' from org.apache.spark.streaming.kafka.KafkaCluster#getLatestLeaderOffsets method but how can I get the until offset? I'm using kafka-2.1.1-0.9.0.1

You can use GetOffsetShell to fetch latest offset from any topic

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic myTopic --time -1

this will return :

myTopic:12341:47841

which mean 47841 is the latest offset for topic myTopic

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM