简体   繁体   English

Spark Streaming Kafka:使用spark-submit运行时,ByteArrayDeserializer的ClassNotFoundException

[英]Spark Streaming Kafka: ClassNotFoundException for ByteArrayDeserializer when run with spark-submit

I'm new to Scala / Spark Streaming, and to StackOverflow so please excuse my formatting. 我是Scala / Spark Streaming和StackOverflow的新手,所以请原谅我的格式。 I have made a Scala app that reads log files from a Kafka Stream. 我做了一个Scala应用程序,可以从Kafka Stream中读取日志文件。 It runs fine within the IDE, but I'll be damned if I can get it to run using spark-submit . 它可以在IDE中正常运行,但是如果我可以使用spark-submit使其运行,那该死的。 It always fails with: 它总是失败:

ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer

The line referenced in the Exception is the load command in this snippet: 异常中引用的行是此代码段中的load命令:

val records = spark
  .readStream
  .format("kafka") // <-- use KafkaSource
  .option("subscribe", kafkaTopic)
  .option("kafka.bootstrap.servers", kafkaBroker) // 192.168.4.86:9092
  .load()
  .selectExpr("CAST(value AS STRING) AS temp")
  .withColumn("record", deSerUDF($"temp"))
  • IDE: IntelliJ IDE:IntelliJ
  • Spark: 2.2.1 火花:2.2.1
  • Scala: 2.11.8 Scala:2.11.8
  • Kafka: kafka_2.11-0.10.0.0 卡夫卡:kafka_2.11-0.10.0.0

Relevant parts of pom.xml : pom.xml相关部分:

<properties>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.11.8</scala.version>
    <scala.compat.version>2.11</scala.compat.version>
    <spark.version>2.2.1</spark.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>${spark.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>com.github.scala-incubator.io</groupId>
        <artifactId>scala-io-file_2.11</artifactId>
        <version>0.4.3-1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.kafka</groupId>
        <artifactId>kafka-clients</artifactId>
        <version>0.10.0.0</version>
        <!-- version>2.0.0</version -->
    </dependency>

Note: I don't think it is related, but I have to use zip -d BroLogSpark.jar "META-INF/*.SF" and zip -d BroLogSpark.jar "META-INF/*.DSA" to get past meaning about the manifest signatures. 注意:我不认为这是相关的,但是我必须使用zip -d BroLogSpark.jar "META-INF/*.SF" zip -d BroLogSpark.jar "META-INF/*.DSA"zip -d BroLogSpark.jar "META-INF/*.DSA"关于清单签名的含义。

My jar file does not include any of org.apache.kafka . 我的jar文件不包含任何org.apache.kafka I have seen several posts that strongly suggest I have a mismatch in versions, and I have tried countless permutations of changes to pom.xml and spark-submit . 我看到了几篇帖子强烈暗示我版本不匹配,并且我尝试了pom.xmlspark-submit更改的无数排列。 After each change, I confirm that it still runs within the IDE, then proceed to try using spark-submit on the same system, same user. 每次更改后,我确认它仍在IDE中运行,然后尝试在同一系统,同一用户上尝试使用spark-submit Below is my most recent attempt, where my BroLogSpark.jar is in the current directory and "192.168.4.86:9092 profile" are input arguments. 以下是我最近的尝试,其中我的BroLogSpark.jar在当前目录中,而“ 192.168.4.86:9092配置文件”是输入参数。

spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.1,org.apache.kafka:kafka-clients:0.10.0.0 BroLogSpark.jar 192.168.4.86:9092 BroFile

Add below dependency too 也添加下面的依赖

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka_2.11</artifactId>
    <version>0.10.0.0</version>
</dependency>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM