简体   繁体   English

Spark Java:java.lang.NoClassDefFoundError

[英]Spark Java: java.lang.NoClassDefFoundError

I am using Spark standalone in my local, and I have used Maven as by build automation tool. 我在本地使用独立的Spark,并且已使用Maven作为构建自动化工具。 So I have all my required dependencies set up for spark, and simple JSON. 因此,我为spark和简单的JSON设置了所有必需的依赖项。 I ran my Spark application well for simple applications like word count, however when I import JSONParser from Simple JSON api I get Class not found exception. 我为诸如字数统计之类的简单应用程序很好地运行了我的Spark应用程序,但是当我从简单JSON API导入JSONParser时,出现Class not found异常。 I have tried to add jar file using sparkconfig and spark context, but it still didn't help me. 我试图使用sparkconfig和spark上下文添加jar文件,但是它仍然没有帮助。

The following is my pom.xml 以下是我的pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org</groupId>
<artifactId>sparketl</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>sparketl</name>
<url>http://maven.apache.org</url>

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.2.0</version>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>com.googlecode.json-simple</groupId>
        <artifactId>json-simple</artifactId>
        <version>1.1.1</version>
    </dependency>


</dependencies>
<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
            </configuration>
        </plugin>
    </plugins>
</build>

And my driver class is: 我的驱动程序类是:

package org.sparketl.etljobs;

import java.util.Arrays;

import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.PairFunction;

import scala.Tuple2;

/**
 * @author vijith.reddy
 *
 */
public final class SparkEtl {
    public static void main(String[] args) throws Exception {
        if (args.length < 3) {
            System.err
                .println("Please use: SparkEtl <master> <input file> <output file>");
        System.exit(1);
    }

    @SuppressWarnings("resource")
    JavaSparkContext spark = new JavaSparkContext(args[0],
            "Json ", System.getenv("SPARK_HOME"),
            JavaSparkContext.jarOfClass(SparkEtl.class));
    //SparkConf sc=new SparkConf();
    //sc.setJars(new String[]{"/Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar"});
    spark.addJar("/Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar");
    JavaRDD<String> file = spark.textFile(args[1]);

    FlatMapFunction<String, String> jsonLine = jsonFile -> {
        return Arrays.asList(jsonFile.toLowerCase().split("\\r?\\n"));
    };

    JavaRDD<String> eachLine = file.flatMap(jsonLine);

    PairFunction<String, String, String> mapCountry = eachItem -> {
        JSONParser parser = new JSONParser();
        String country = "";
        try {
            Object obj = parser.parse(eachItem);
            JSONObject jsonObj = (JSONObject) obj;
            country = (String) jsonObj.get("country");
        } catch (Exception e) {
            e.printStackTrace();
        }
        return new Tuple2<String, String>(eachItem, country);
    };


    JavaPairRDD<String, String> pairs = eachLine.mapToPair(mapCountry);

    pairs.sortByKey(true).saveAsTextFile(args[2]);
    System.exit(0);

}

} }

I get the following error in my logs: 我的日志中出现以下错误:

    15/07/08 16:09:17 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/07/08 16:09:17 INFO SparkContext: Added JAR /Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar at http://172.16.8.157:52255/jars/json-simple-1.1.1-sources.jar with timestamp 1436396957111
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(110248) called with curMem=0, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 265.0 MB)
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(10090) called with curMem=110248, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.9 KB, free 265.0 MB)
15/07/08 16:09:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.8.157:52257 (size: 9.9 KB, free: 265.1 MB)
15/07/08 16:09:17 INFO SparkContext: Created broadcast 0 from textFile at SparkEtl.java:35
15/07/08 16:09:17 INFO FileInputFormat: Total input paths to process : 1
15/07/08 16:09:17 INFO SparkContext: Starting job: sortByKey at SparkEtl.java:58
15/07/08 16:09:17 INFO DAGScheduler: Got job 0 (sortByKey at SparkEtl.java:58) with 2 output partitions (allowLocal=false)
15/07/08 16:09:17 INFO DAGScheduler: Final stage: ResultStage 0(sortByKey at SparkEtl.java:58)
15/07/08 16:09:17 INFO DAGScheduler: Parents of final stage: List()
15/07/08 16:09:17 INFO DAGScheduler: Missing parents: List()
15/07/08 16:09:17 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[5] at sortByKey at SparkEtl.java:58), which has no missing parents
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(5248) called with curMem=120338, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.1 KB, free 265.0 MB)
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(2888) called with curMem=125586, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 265.0 MB)
15/07/08 16:09:17 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.8.157:52257 (size: 2.8 KB, free: 265.1 MB)
15/07/08 16:09:17 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
15/07/08 16:09:17 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[5] at sortByKey at SparkEtl.java:58)
15/07/08 16:09:17 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/07/08 16:09:18 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.16.8.157:52260/user/Executor#2100827222]) with ID 0
15/07/08 16:09:18 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.16.8.157, PROCESS_LOCAL, 1560 bytes)
15/07/08 16:09:18 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.16.8.157, PROCESS_LOCAL, 1560 bytes)
15/07/08 16:09:18 INFO BlockManagerMasterEndpoint: Registering block manager 172.16.8.157:52263 with 265.1 MB RAM, BlockManagerId(0, 172.16.8.157, 52263)
15/07/08 16:09:18 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.8.157:52263 (size: 2.8 KB, free: 265.1 MB)
15/07/08 16:09:18 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.8.157:52263 (size: 9.9 KB, free: 265.1 MB)
15/07/08 16:09:19 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 172.16.8.157): java.lang.NoClassDefFoundError: org/json/simple/parser/JSONParser
    at org.sparketl.etljobs.SparkEtl.lambda$main$b9f570ea$1(SparkEtl.java:44)
    at org.sparketl.etljobs.SparkEtl$$Lambda$11/1498038525.call(Unknown Source)
    at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1030)
    at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1030)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:42)
    at org.apache.spark.RangePartitioner$$anonfun$8.apply(Partitioner.scala:259)
    at org.apache.spark.RangePartitioner$$anonfun$8.apply(Partitioner.scala:257)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:703)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:703)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

15/07/08 16:09:19 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor 172.16.8.157: java.lang.NoClassDefFoundError (org/json/simple/parser/JSONParser) [duplicate 1]

My Spark config has 我的Spark配置有

spark.executor.memory   512m
spark.driver.cores      1
spark.driver.memory     512m
spark.driver.extraClassPath   /Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar

Did anyone come across this issue? 有人遇到过这个问题吗? If so what can be the resolution to this? 如果是这样,那么解决办法是什么?

According to spark.driver.extraClassPath (and codebase) - the library provided to Spark is a source library ( json-simple-1.1.1-sources.jar ). 根据spark.driver.extraClassPath (和代码库)-提供给Spark的库是源库( json-simple-1.1.1-sources.jar )。 That library probably contains only java files (source files, not compiled java classes). 该库可能仅包含Java文件(源文件,而不是编译的Java类)。

Changing it to json-simple-1.1.1.jar (with full path, of course) should help. 将其更改为json-simple-1.1.1.jar (当然,具有完整路径)应该会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM