[英]Spark Streaming Kafka: ClassNotFoundException for ByteArrayDeserializer when run with spark-submit
I'm new to Scala / Spark Streaming, and to StackOverflow so please excuse my formatting. 我是Scala / Spark Streaming和StackOverflow的新手,所以请原谅我的格式。 I have made a Scala app that reads log files from a Kafka Stream.
我做了一个Scala应用程序,可以从Kafka Stream中读取日志文件。 It runs fine within the IDE, but I'll be damned if I can get it to run using
spark-submit
. 它可以在IDE中正常运行,但是如果我可以使用
spark-submit
使其运行,那该死的。 It always fails with: 它总是失败:
ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
The line referenced in the Exception is the load command in this snippet: 异常中引用的行是此代码段中的load命令:
val records = spark
.readStream
.format("kafka") // <-- use KafkaSource
.option("subscribe", kafkaTopic)
.option("kafka.bootstrap.servers", kafkaBroker) // 192.168.4.86:9092
.load()
.selectExpr("CAST(value AS STRING) AS temp")
.withColumn("record", deSerUDF($"temp"))
Relevant parts of pom.xml
: pom.xml
相关部分:
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.11.8</scala.version>
<scala.compat.version>2.11</scala.compat.version>
<spark.version>2.2.1</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>com.github.scala-incubator.io</groupId>
<artifactId>scala-io-file_2.11</artifactId>
<version>0.4.3-1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.10.0.0</version>
<!-- version>2.0.0</version -->
</dependency>
Note: I don't think it is related, but I have to use zip -d BroLogSpark.jar "META-INF/*.SF"
and zip -d BroLogSpark.jar "META-INF/*.DSA"
to get past meaning about the manifest signatures. 注意:我不认为这是相关的,但是我必须使用
zip -d BroLogSpark.jar "META-INF/*.SF"
zip -d BroLogSpark.jar "META-INF/*.DSA"
和zip -d BroLogSpark.jar "META-INF/*.DSA"
关于清单签名的含义。
My jar file does not include any of org.apache.kafka
. 我的jar文件不包含任何
org.apache.kafka
。 I have seen several posts that strongly suggest I have a mismatch in versions, and I have tried countless permutations of changes to pom.xml
and spark-submit
. 我看到了几篇帖子强烈暗示我版本不匹配,并且我尝试了
pom.xml
和spark-submit
更改的无数排列。 After each change, I confirm that it still runs within the IDE, then proceed to try using spark-submit
on the same system, same user. 每次更改后,我确认它仍在IDE中运行,然后尝试在同一系统,同一用户上尝试使用
spark-submit
。 Below is my most recent attempt, where my BroLogSpark.jar
is in the current directory and "192.168.4.86:9092 profile" are input arguments. 以下是我最近的尝试,其中我的
BroLogSpark.jar
在当前目录中,而“ 192.168.4.86:9092配置文件”是输入参数。
spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.1,org.apache.kafka:kafka-clients:0.10.0.0 BroLogSpark.jar 192.168.4.86:9092 BroFile
Add below dependency too 也添加下面的依赖
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.11</artifactId>
<version>0.10.0.0</version>
</dependency>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.