简体   繁体   English

Spark因org.apache.kafka.common.serialization.StringDeserializer的NoClassDefFoundError而失败

[英]Spark fails with NoClassDefFoundError for org.apache.kafka.common.serialization.StringDeserializer

I am developing a generic Spark application that listens to a Kafka stream using Spark and Java. 我正在开发一个通用的Spark应用程序,该应用程序使用Spark和Java侦听Kafka流。

I am using kafka_2.11-0.10.2.2, spark-2.3.2-bin-hadoop2.7 - I also tried several other kafka/spark combinations before posting this question. 我正在使用kafka_2.11-0.10.2.2,spark-2.3.2-bin-hadoop2.7-在发布此问题之前,我还尝试了其他几种kafka / spark组合。

The code fails at loading StringDeserializer class: 代码在加载StringDeserializer类时失败:

 SparkConf sparkConf = new SparkConf().setAppName("JavaDirectKafkaWordCount");
    JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));

    Set<String> topicsSet = new HashSet<>();
    topicsSet.add(topics);
    Map<String, Object> kafkaParams = new HashMap<>();
    kafkaParams.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
    kafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
    kafkaParams.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
    kafkaParams.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);

The error I get is: 我得到的错误是:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/StringDeserializer

From Why does Spark application fail with "Exception in thread "main" java.lang.NoClassDefFoundError: ...StringDeserializer"? 来自为什么Spark应用程序会因“线程“主”中的异常java.lang.NoClassDefFoundError:... StringDeserializer”而失败? it seems that this could be a scala version mismatch issue, but my pom.xml doesn't have that issue: 看来这可能是scala版本不匹配的问题,但是我的pom.xml没有这个问题:

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>yyy.iot.ckc</groupId>
<artifactId>sparkpoc</artifactId>
<version>1.0-SNAPSHOT</version>

<name>sparkpoc</name>
<!-- FIXME change it to the project's website -->
<url>http://www.example.com</url>

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <java.version>1.8</java.version>

    <spark.scala.version>2.11</spark.scala.version>
    <spark.version>2.3.2</spark.version>
</properties>

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${spark.scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_${spark.scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka-0-10_${spark.scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>

</dependencies>

<build>
    <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
        <plugins>
            <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
            <plugin>
                <artifactId>maven-clean-plugin</artifactId>
                <version>3.1.0</version>
            </plugin>
            <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
            <plugin>
                <artifactId>maven-resources-plugin</artifactId>
                <version>3.0.2</version>
            </plugin>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.0</version>
            </plugin>
            <plugin>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.22.1</version>
            </plugin>
            <plugin>
                <artifactId>maven-jar-plugin</artifactId>
                <version>3.0.2</version>
            </plugin>
            <plugin>
                <artifactId>maven-install-plugin</artifactId>
                <version>2.5.2</version>
            </plugin>
            <plugin>
                <artifactId>maven-deploy-plugin</artifactId>
                <version>2.8.2</version>
            </plugin>
            <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
            <plugin>
                <artifactId>maven-site-plugin</artifactId>
                <version>3.7.1</version>
            </plugin>
            <plugin>
                <artifactId>maven-project-info-reports-plugin</artifactId>
                <version>3.0.0</version>
            </plugin>
        </plugins>
    </pluginManagement>
</build>
</project>

The submission script I use is: 我使用的提交脚本是:

./bin/spark-submit \
    --class "yyy.iot.ckc.KafkaDataModeler" \
    --master local[2] \
    ../sparkpoc/target/sparkpoc-1.0-SNAPSHOT.jar

Can anyone please point me in the right direction as to where I am going wrong? 任何人都可以向我指出我要去哪里的错误方向吗?

You need to use the Maven Shade Plugin to package the Kafka clients along with your Spark application, then you can submit the shaded Jar, and the Kafka serializers should be found on the classpath. 您需要使用Maven Shade插件来将Kafka客户端与Spark应用程序打包在一起,然后才能提交带阴影的Jar,并且应该在类路径中找到Kafka序列化程序。

Also, make sure you set the provided Spark packages 另外,请确保您设置了提供的 Spark软件包

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_${spark.scala.version}</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_${spark.scala.version}</artifactId>
    <version>${spark.version}</version>
    <scope>provided</scope>
</dependency>

Spark runs the program as by running an instance of a JVM. Spark通过运行JVM实例来运行程序。 So if the libraries (JARs) are not in the classpath of that JVM we run into this runtime exception. 因此,如果库(JAR)不在该JVM的类路径中,我们将遇到此运行时异常。 The solution is to package all the dependent JARs along with main JAR. 解决方案是将所有相关的JAR与主JAR打包在一起。 The following build script will work for that. 以下构建脚本将适用于此。

Also, as mentioned in https://stackoverflow.com/a/54583941/1224075 the scope of the spark-core and spark-streaming libraries need to be declared as provided. 另外,如https://stackoverflow.com/a/54583941/1224075所述,需要声明spark-core和spark-streaming库的范围。 This is because some of the libraries are implicitly provided by the Spark JVM. 这是因为某些库是由Spark JVM隐式提供的。

The build section of the POM which worked for me - 对我有用的POM的构建部分-

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>2.2.1</version>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 运行 Spark 示例:ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer - Running Spark example: ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer NoClassDefFoundError:org/apache/kafka/common/serialization/StringDeserializer - NoClassDefFoundError: org/apache/kafka/common/serialization/StringDeserializer java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer 用于火花流 - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer for spark streaming org.apache.kafka.common.KafkaException:SaleRequestFactory 类不是 org.apache.kafka.common.serialization.Serializer 的实例 - org.apache.kafka.common.KafkaException: class SaleRequestFactory is not an instance of org.apache.kafka.common.serialization.Serializer KafkaException:class 不是 org.apache.kafka.common.serialization.Deserializer 的实例 - KafkaException: class is not an instance of org.apache.kafka.common.serialization.Deserializer NoClassDefFoundError: org/apache/spark/SparkConf - NoClassDefFoundError: org/apache/spark/SparkConf Spring Cloud Stream Kafka-找不到Serde类:org.apache.kafka.common.serialization.Serde $ StringSerde - Spring Cloud Stream Kafka - Serde class not found: org.apache.kafka.common.serialization.Serde$StringSerde 获取错误NoClassDefFoundError:org.apache.spark.internal.Loging on Kafka Spark Stream - Geting error NoClassDefFoundError: org.apache.spark.internal.Logging on Kafka Spark Stream 结构化流卡夫卡火花 java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging - Structured Streaming kafka spark java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging 为什么 org.apache.kafka.common.serialization 中的 Serializer&lt;&gt; 接口的重写 serialize() 方法中有“主题”参数 - Why is there a "topic" parameter in the overridden serialize() method from Serializer<> interface in org.apache.kafka.common.serialization
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM