I'm new in spark and kafka and I'm using spark streaming to process data coming from a kafka topic. For now, I just want to print the records in the console. I have a mini cluster with spark on two nodes (scala version 2.12.2 and spark-2.1.1) and a node with kafka (version kafka_2.11-0.10.2.0). However when I submit my code I get this error :
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 1.3.64.64, executor 1): java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.<init>(KafkaRDD.scala:193)
at org.apache.spark.streaming.kafka010.KafkaRDD.compute(KafkaRDD.scala:185)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Does it have something to do with the versions ? Or maybe my code is not correct !
Here is my code :
import java.util.UUID
import org.apache.kafka.clients.consumer.ConsumerRecord
import runtime.ScalaRunTime.stringOf
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
object followProduction {
def main(args: Array[String]) = {
val sparkConf = new SparkConf().setMaster("spark://<real adress here : 10. ...>:7077").setAppName("followProcess")
val streamContext = new StreamingContext(sparkConf, Seconds(2))
streamContext.checkpoint("checkpoint")
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "1.3.64.66:9094",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> s"${UUID.randomUUID().toString}",
"auto.offset.reset" -> "earliest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("test")
val stream = KafkaUtils.createDirectStream[String, String](
streamContext,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
stream.print()
//stream.map(record => (record.key, record.value)).count().print()
streamContext.start()
streamContext.awaitTermination()
}
}
And here is my built :
name := "test"
version := "1.0"
scalaVersion := "2.12.2"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.1.1" %"provided"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "2.1.1" %"provided"
libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.10" % "2.0.0"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
Any help will be appreciated and thank you for your time.
Spark 2.1.x is compiled against Scala 2.11, not 2.12.
Try:
scalaVersion := 2.11.11
Any 2.11.x version should work.
Also, your Kafka streaming dependency is referring to Scala 2.10, when you need 2.11:
libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.1.1"
Apart from your version mismatches, I think you are running Spark Cluster for which you need to submit all your JARS
(libs) to the Spark slave machines(nodes) from your actual application where you are running with Spark driver.
You can submit jars
with SparkConf
using method .setJars(libs)
.
Something like this
lazy val conf: SparkConf = new SparkConf()
.setMaster(sparkMaster)
.setAppName(sparkAppName)
.set("spark.app.id", sparkAppId)
.set("spark.submit.deployMode", "cluster")
.setJars(libs) //setting jars for sparkContext
Note: libs: Seq[String]
ie sequence of library paths
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.