Kafka with spark streaming integration error

Question

I'm not able to run Kafka with spark-streaming. Following are the steps I've taken till now:

Downloaded the jar file "spark-streaming-kafka-0-8-assembly_2.10-2.2.0.jar" and moved it to /home/ec2-user/spark-2.0.0-bin-hadoop2.7/jars
Added this line to /home/ec2-user/spark-2.0.0-bin-hadoop2.7/conf/spark-defaults.conf.template -> spark.jars.packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.10:2.2.0

Kafka Version: kafka_2.10-0.10.2.2

Jar file version: spark-streaming-kafka-0-8-assembly_2.10-2.2.0.jar

Python Code:

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.10-2.2.0 pyspark-shell' 
kvs = KafkaUtils.createDirectStream(ssc, ["divolte-data"], {"metadata.broker.list": "localhost:9092"})

But I'm still getting the following error:

Py4JJavaError: An error occurred while calling o39.createDirectStreamWithoutMessageHandler.
: java.lang.NoClassDefFoundError: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$
    at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
    at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:59)

What am I doing wrong?

Answer 1

spark-defaults.conf.template is only a template, and not read by Spark, therefore your JARs will not be loaded. You must copy/rename this file to remove the template suffix

You'll also need to download Spark 2.2 if you want to use those specific JAR files.

And make sure that your Spark version uses Scala 2.10 if that's the Kafka package you want to use. Otherwise, use 2.11 version

Kafka with spark streaming integration error

Question

1 answers

solution1
0 2018-11-09 15:20:50

Kafka with spark streaming integration error

Question

1 answers

solution1 0 2018-11-09 15:20:50

solution1
0 2018-11-09 15:20:50