I'm not able to run Kafka with spark-streaming. Following are the steps I've taken till now:
Downloaded the jar
file "spark-streaming-kafka-0-8-assembly_2.10-2.2.0.jar" and moved it to /home/ec2-user/spark-2.0.0-bin-hadoop2.7/jars
Added this line to /home/ec2-user/spark-2.0.0-bin-hadoop2.7/conf/spark-defaults.conf.template
-> spark.jars.packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.10:2.2.0
Kafka Version: kafka_2.10-0.10.2.2
Jar file version: spark-streaming-kafka-0-8-assembly_2.10-2.2.0.jar
Python Code:
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.10-2.2.0 pyspark-shell'
kvs = KafkaUtils.createDirectStream(ssc, ["divolte-data"], {"metadata.broker.list": "localhost:9092"})
But I'm still getting the following error:
Py4JJavaError: An error occurred while calling o39.createDirectStreamWithoutMessageHandler.
: java.lang.NoClassDefFoundError: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$
at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:59)
What am I doing wrong?
spark-defaults.conf.template
is only a template, and not read by Spark, therefore your JARs will not be loaded. You must copy/rename this file to remove the template suffix
You'll also need to download Spark 2.2 if you want to use those specific JAR files.
And make sure that your Spark version uses Scala 2.10 if that's the Kafka package you want to use. Otherwise, use 2.11 version
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.