[英]Spark fails with NoClassDefFoundError for org.apache.kafka.common.serialization.StringDeserializer
I am developing a generic Spark application that listens to a Kafka stream using Spark and Java. 我正在开发一个通用的Spark应用程序,该应用程序使用Spark和Java侦听Kafka流。
I am using kafka_2.11-0.10.2.2, spark-2.3.2-bin-hadoop2.7 - I also tried several other kafka/spark combinations before posting this question. 我正在使用kafka_2.11-0.10.2.2,spark-2.3.2-bin-hadoop2.7-在发布此问题之前,我还尝试了其他几种kafka / spark组合。
The code fails at loading StringDeserializer class: 代码在加载StringDeserializer类时失败:
SparkConf sparkConf = new SparkConf().setAppName("JavaDirectKafkaWordCount");
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));
Set<String> topicsSet = new HashSet<>();
topicsSet.add(topics);
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
kafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
kafkaParams.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
kafkaParams.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
The error I get is: 我得到的错误是:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/StringDeserializer
From Why does Spark application fail with "Exception in thread "main" java.lang.NoClassDefFoundError: ...StringDeserializer"? 来自为什么Spark应用程序会因“线程“主”中的异常java.lang.NoClassDefFoundError:... StringDeserializer”而失败? it seems that this could be a scala version mismatch issue, but my
pom.xml
doesn't have that issue: 看来这可能是scala版本不匹配的问题,但是我的
pom.xml
没有这个问题:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>yyy.iot.ckc</groupId>
<artifactId>sparkpoc</artifactId>
<version>1.0-SNAPSHOT</version>
<name>sparkpoc</name>
<!-- FIXME change it to the project's website -->
<url>http://www.example.com</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<java.version>1.8</java.version>
<spark.scala.version>2.11</spark.scala.version>
<spark.version>2.3.2</spark.version>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${spark.scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${spark.scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_${spark.scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
<plugins>
<!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.1</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>3.0.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
The submission script I use is: 我使用的提交脚本是:
./bin/spark-submit \
--class "yyy.iot.ckc.KafkaDataModeler" \
--master local[2] \
../sparkpoc/target/sparkpoc-1.0-SNAPSHOT.jar
Can anyone please point me in the right direction as to where I am going wrong? 任何人都可以向我指出我要去哪里的错误方向吗?
You need to use the Maven Shade Plugin to package the Kafka clients along with your Spark application, then you can submit the shaded Jar, and the Kafka serializers should be found on the classpath. 您需要使用Maven Shade插件来将Kafka客户端与Spark应用程序打包在一起,然后才能提交带阴影的Jar,并且应该在类路径中找到Kafka序列化程序。
Also, make sure you set the provided Spark packages 另外,请确保您设置了提供的 Spark软件包
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${spark.scala.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${spark.scala.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
Spark runs the program as by running an instance of a JVM. Spark通过运行JVM实例来运行程序。 So if the libraries (JARs) are not in the classpath of that JVM we run into this runtime exception.
因此,如果库(JAR)不在该JVM的类路径中,我们将遇到此运行时异常。 The solution is to package all the dependent JARs along with main JAR.
解决方案是将所有相关的JAR与主JAR打包在一起。 The following build script will work for that.
以下构建脚本将适用于此。
Also, as mentioned in https://stackoverflow.com/a/54583941/1224075 the scope of the spark-core and spark-streaming libraries need to be declared as provided. 另外,如https://stackoverflow.com/a/54583941/1224075所述,需要声明spark-core和spark-streaming库的范围。 This is because some of the libraries are implicitly provided by the Spark JVM.
这是因为某些库是由Spark JVM隐式提供的。
The build section of the POM which worked for me - 对我有用的POM的构建部分-
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2.1</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.