[英]Spark Kafka Streaming multi partition CommitAsync issue
I am reading a message from Kafka topic which has multiple partitions. 我正在阅读来自Kafka主题的消息,该主题具有多个分区。 While reading from message no issue, while Committing the offset range to Kafka, I am getting an error. 从消息中读取没有问题,而将偏移范围提交给Kafka时,却出现错误。 I tried my level best and not able to resolve this issue. 我已尽力尝试了该级别,但无法解决此问题。
Code 码
object ParallelStreamJob {
def main(args: Array[String]): Unit = {
val spark = SparkHelper.getOrCreateSparkSession()
val ssc = new StreamingContext(spark.sparkContext, Seconds(10))
spark.sparkContext.setLogLevel("WARN")
val kafkaStream = {
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "welcome3",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("test2")
val numPartitionsOfInputTopic = 2
val streams = (1 to numPartitionsOfInputTopic) map {
_ => KafkaUtils.createDirectStream[String, String]( ssc, PreferConsistent, Subscribe[String, String](topics, kafkaParams) )
}
streams
}
// var offsetRanges = Array[OffsetRange]()
kafkaStream.foreach(rdd=> {
rdd.foreachRDD(conRec=> {
val offsetRanges = conRec.asInstanceOf[HasOffsetRanges].offsetRanges
conRec.foreach(str=> {
println(str.value())
for (o <- offsetRanges) {
println(s"${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}")
}
})
kafkaStream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
})
})
println(" Spark parallel reader is ready !!!")
ssc.start()
ssc.awaitTermination()
}
}
Error 错误
18/03/19 21:21:30 ERROR JobScheduler: Error running job streaming job 1521512490000 ms.0
java.lang.ClassCastException: scala.collection.immutable.Vector cannot be cast to org.apache.spark.streaming.kafka010.CanCommitOffsets
at com.cts.ignite.inventory.core.ParallelStreamJob$$anonfun$main$1$$anonfun$apply$1.apply(ParallelStreamJob.scala:48)
at com.cts.ignite.inventory.core.ParallelStreamJob$$anonfun$main$1$$anonfun$apply$1.apply(ParallelStreamJob.scala:39)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
at org.a
you can commit the offset like 您可以像这样提交偏移量
stream.foreachRDD { rdd =>
val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
// some time later, after outputs have completed
stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges)
}
in your case kafkaStream
is Seq
of stream. 在您的情况下, kafkaStream
是流的Seq
。 change you commit line. 更改您的提交行。 Reference: https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html 参考: https : //spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
将kafkaStream.asInstanceOf [CanCommitOffsets] .commitAsync(offsetRanges)行更改为rdd.asInstanceOf [CanCommitOffsets] .commitAsync(offsetRanges)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.