简体   繁体   中英

Kafka with spark streaming integration error

I'm not able to run Kafka with spark-streaming. Following are the steps I've taken till now:

  1. Downloaded the jar file "spark-streaming-kafka-0-8-assembly_2.10-2.2.0.jar" and moved it to /home/ec2-user/spark-2.0.0-bin-hadoop2.7/jars

  2. Added this line to /home/ec2-user/spark-2.0.0-bin-hadoop2.7/conf/spark-defaults.conf.template -> spark.jars.packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.10:2.2.0

Kafka Version: kafka_2.10-0.10.2.2

Jar file version: spark-streaming-kafka-0-8-assembly_2.10-2.2.0.jar

Python Code:

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.10-2.2.0 pyspark-shell' 
kvs = KafkaUtils.createDirectStream(ssc, ["divolte-data"], {"metadata.broker.list": "localhost:9092"})

But I'm still getting the following error:

Py4JJavaError: An error occurred while calling o39.createDirectStreamWithoutMessageHandler.
: java.lang.NoClassDefFoundError: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$
    at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
    at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:59)

What am I doing wrong?

spark-defaults.conf.template is only a template, and not read by Spark, therefore your JARs will not be loaded. You must copy/rename this file to remove the template suffix

You'll also need to download Spark 2.2 if you want to use those specific JAR files.

And make sure that your Spark version uses Scala 2.10 if that's the Kafka package you want to use. Otherwise, use 2.11 version

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM