简体   繁体   中英

Apache spark - java.lang.NoClassDefFoundError

I have maven based mixed scala/java application that can submit spar jobs. My application jar "myapp.jar" has some nested jars inside lib folder. one of which is "common.jar". I have defined class-path attribute in Manifest file like Class-Path: lib/common.jar . Spark executor throws java.lang.NoClassDefFoundError:com/myapp/common/myclass error when submitting application in yarn-client mode. Class(com/myapp/common/myclass.class) and jar(common.jar) is there and nested inside my main myapp.jar. Fat jar is created using spring-boot-maven plugin which nest other jars inside lib folder of parent jar. I prefer not to create shaded flat jar as that would create other issues. Anyway spark executor jvm can load nested jars here?

EDIT spark (jvm classloader) can find all the classes those are flat inside myapp.jar itself. ie com/myapp/abc.class, com/myapp/xyz.class etc.

EDIT2 spark executor classloader can also find some classes from nested jar but it throws NoClassDefFoundError some other classes in same nested jar! here's the error:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, host4.local): java.lang.NoClassDefFoundError: com/myapp/common/myclass
    at com.myapp.UserProfileRDD$.parse(UserProfileRDDInit.scala:111)
    at com.myapp.UserProfileRDDInit$$anonfun$generateUserProfileRDD$1.apply(UserProfileRDDInit.scala:87)
    at com.myapp.UserProfileRDDInit$$anonfun$generateUserProfileRDD$1.applyUserProfileRDDInit.scala:87)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
    at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:172)
    at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:79)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException: 
com.myapp.common.myclass
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 14 more

I do submit myapp.jar with sparkConf.setJar(String[] {"myapp.jar"}) and also tried setting it on spark.yarn.executor.extraClassPath

EDIT 3 As a workaround, I extracted myapp.jar and set sparkConf.setJar(String[] {"myapp.jar","lib/common.jar"}) manually and error went away but obviously I have to do that for all the nested jar which is not desirable.

You can use --jars options, to give comma separated list of jars while starting the Spark Application.

Something like

spark-submit --jars lib/abc.jar,lib/xyz.jar --class <CLASSNAME> myapp.jar

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM