简体   繁体   English

Spark和Kafka问题-线程“ main”中的异常java.lang.NoClassDefFoundError:org.apache.spark.streaming.kafka010.LocationStrategies

[英]Spark and Kafka issue - Exception in thread “main” java.lang.NoClassDefFoundError: org.apache.spark.streaming.kafka010.LocationStrategies

I will start by mentioning that I have tried all the suggestion in similar topics and nothing worked for me, so please, this is not a duplicate question . 首先,我会说我已经尝试了类似主题中的所有建议,但对我没有任何帮助,因此请不要重复这个问题

The issue that I am having is as follows - 我遇到的问题如下-

I am trying to run a sample java application on spark, using spark streaming and kafka. 我试图使用Spark Streaming和Kafka在Spark上运行示例Java应用程序。 I have added all the required dependencies: 我添加了所有必需的依赖项:

  <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.12</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.3.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>2.3.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.3.0</version>
        </dependency>
    </dependencies>

After deploying the Jar on a server where I would like to run my application (I have already set up the environment with spark, kafka, and created the relevant topic) I am trying to spark-submit it and getting the following error: 将Jar部署到我想运行我的应用程序的服务器上之后(我已经使用spark,kafka设置了环境,并创建了相关主题),我试图将其提交并得到以下错误:

Exception in thread "main" java.lang.NoClassDefFoundError: org.apache.spark.streaming.kafka010.LocationStrategies
        at JavaWordCount.main(JavaWordCount.java:47)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
        at java.lang.reflect.Method.invoke(Method.java:508)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka010.LocationStrategies
        at java.net.URLClassLoader.findClass(URLClassLoader.java:609)
        at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:924)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:869)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:852)

It seems that the workers cannot identify the dependencies as part of the environment. 似乎工人无法将依赖关系识别为环境的一部分。 I did some research online, and many suggest to create an assembly JAR with maven-shade-plugin. 我在网上做了一些研究,许多人建议使用maven-shade-plugin创建一个程序集JAR。 So I also tried to maven-package the jar this way, but still no success. 因此,我也尝试用这种方式对jar进行maven包装,但仍然没有成功。

For reference, here is where the app is failing: 供参考,这是该应用程序失败的地方:

// Configure Spark to connect to Kafka running on local machine
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
kafkaParams.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, 
                "org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
                "org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG,"group1");
kafkaParams.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
kafkaParams.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,true);

//Configure Spark to listen messages in topic test
Collection<String> topics = Arrays.asList("wordCount");

SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("SparkKafkaWordCount");

//Read messages in batch of 30 seconds
JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(30));

// Start reading messages from Kafka and get DStream
final JavaInputDStream<ConsumerRecord<String, String>> stream =
        KafkaUtils.createDirectStream(jssc, LocationStrategies.PreferConsistent(), 
                                      ConsumerStrategies.<String,String>Subscribe(topics,kafkaParams));

In the last line above, the class LocationStrategies is not recognized, even though I have added the right dependency to the pom.xml 在上面的最后一行中,即使我已经将正确的依赖项添加到pom.xml中,也无法识别类LocationStrategies

Any ideas how to fix this issue? 任何想法如何解决此问题?

当我在spark-spark-submit命令的--jar下包含以下jar时,没有出现此错误:

  • sql-kafka-0-10_2.11-2.2.1.jar
  • kafka-clients-0.10.1.0.jar
  • spark-streaming-kafka-0-10_2.11-2.2.1.jar
  • spark-streaming-kafka-0-10-assembly_2.11-2.2.1.jar
  • spark-streaming-kafka_2.11-1.6.3.jar

Even I faced the same issue, didn't get much help from googling but what I've understood after reading many threads, the dependencies which are mentioned in pom.xml file has scope as "provided" which means we need to specify the dependent jar files during the time of execution. 即使我也遇到了同样的问题,但谷歌搜索并没有太多帮助,但是在阅读许多线程后我了解到,pom.xml文件中提到的依赖项的作用域为“提供”,这意味着我们需要指定依赖项执行期间的jar文件。 Also all the examples inside Apache Spark package was compiled as a single jar file and we need to specify the class path to execute the required module. 同样,Apache Spark软件包中的所有示例都被编译为单个jar文件,我们需要指定类路径以执行所需的模块。 Download the necessary jar files which you've mentioned in pom.xml and execute like this 下载您在pom.xml中提到的必要的jar文件,并执行如下操作

spark-submit --jars kafka-clients-1.1.0.jar,spark-streaming_2.11-2.3.0.jar,spark-streaming-kafka-0-10_2.11-2.3.0.jar --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount target/original-spark-examples_2.11-2.4.0-SNAPSHOT.jar <brokerip:port> <topic>

But before that you need to re-write the consumer properties in the java file or else you will get error saying configuration is missing 但是在此之前,您需要在Java文件中重新编写使用者属性,否则会收到错误消息,指出缺少配置

kafkaParams.put("bootstrap.servers", brokers);
kafkaParams.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("group.id", "<group_id>");

add this to your pom.xml 将此添加到您的pom.xml

        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>

            <configuration>
                <archive>
                    <manifest>
                        <mainClass>your main class </mainClass>
                    </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
        </plugin>

and then build the jar with 然后用

mvn clean compile assembly:single

You should get two jars in the target directory, one without dependencies and one with dependencies (your-jar-1.0-jar-with-dependencies.jar) 您应该在目标目录中获得两个jar,一个不具有依赖关系,一个具有依赖关系(your-jar-1.0-jar-with-dependencies.jar)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 结构化流卡夫卡火花 java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging - Structured Streaming kafka spark java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging Java Apache Spark Maven Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf - Java Apache Spark Maven Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer 用于火花流 - java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer for spark streaming Spark和Cassandra Java应用程序:线程“ main”中的异常java.lang.NoClassDefFoundError:org / apache / spark / sql / Dataset - Spark and Cassandra Java application: Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset HBase的星火:异常在线程“主要” java.lang.NoClassDefFoundError:组织/阿帕奇/火花/日志 - Hbase-Spark :Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/Logging 线程“ main”中的异常java.lang.NoClassDefFoundError:org / apache / spark / internal / Logging - Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging Kafka Spark Streaming LocationStrategies java class def未找到异常 - Kafka Spark Streaming LocationStrategies java class def not found exception Spark Kafka流-java.lang.NoClassDefFoundError:akka / util / Helpers $ ConfigOps $ - Spark kafka streaming - java.lang.NoClassDefFoundError: akka/util/Helpers$ConfigOps$ java.lang.NoClassDefFoundError:org / apache / kafka / clients / producer / Producer - java.lang.NoClassDefFoundError: org/apache/kafka/clients/producer/Producer 线程“ main”中的异常java.lang.NoClassDefFoundError:org / apache / commons / lang / builder / CompareToBuilder - Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/commons/lang/builder/CompareToBuilder
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM