为什么集群模式下的YARN上的Spark会因“线程中的异常”驱动程序“java.lang.NullPointerException”而失败？

Question

I'm using emr-5.4.0 with Spark 2.1.0. 我正在使用emr-5.4.0和Spark 2.1.0。 I understand what NullPointerException is, this question is about why that was thrown in this particular case. 我理解NullPointerException是什么，这个问题是关于为什么在这种特殊情况下抛出它。

Cannot really figure out why I got NullPointerException in the driver thread. 无法弄清楚为什么我在驱动程序线程中得到NullPointerException。

I got this weird job failing with this error: 我得到了这个奇怪的工作失败了这个错误：

18/03/29 20:07:52 INFO ApplicationMaster: Starting the user application in a separate Thread
18/03/29 20:07:52 INFO ApplicationMaster: Waiting for spark context initialization...
Exception in thread "Driver" java.lang.NullPointerException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
18/03/29 20:07:52 ERROR ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: SparkContext is null but app is still running!
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:415)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:766)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:764)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/03/29 20:07:52 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.IllegalStateException: SparkContext is null but app is still running!)
18/03/29 20:07:52 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: java.lang.IllegalStateException: SparkContext is null but app is still running!)
18/03/29 20:07:52 INFO ApplicationMaster: Deleting staging directory hdfs://<ip-address>.ec2.internal:8020/user/hadoop/.sparkStaging/application_1522348295743_0010
18/03/29 20:07:52 INFO ShutdownHookManager: Shutdown hook called
End of LogType:stderr

I submitted this job as this: 我提交了这份工作：

spark-submit --deploy-mode cluster --master yarn --num-executors 40 --executor-cores 16 --executor-memory 100g --driver-cores 8 --driver-memory 100g --class <package.class_name> --jars <s3://s3_path/some_lib.jar> <s3://s3_path/class.jar>

And my class looks like this: 我的班级看起来像这样：

class MyClass {

  def main(args: Array[String]): Unit = {
    val c = new MyClass()
    c.process()
  }

  def process(): Unit = {
    val sparkConf = new SparkConf().setAppName("my-test")
    val sparkSession: SparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
    import sparkSession.implicits._
    ....
  }

  ...
}

Answer 1

Change class MyClass to object MyClass and you're done. 将class MyClass更改为object MyClass ，您就完成了。

While we're at it, I'd also change class MyClass to object MyClass extends App and remove def main(args: Array[String]): Unit (as given by extends App ). 当我们在它的时候，我也将class MyClass更改为object MyClass extends App并删除def main(args: Array[String]): Unit （由extends App给出）。

I've reported an improvement for Spark 2.3.0 - [SPARK-23830] Spark on YARN in cluster deploy mode fail with NullPointerException when a Spark application is a Scala class not object - to have it reported nicely to an end user. 我已经报告了Spark 2.3.0的改进 - [SPARK-23830]当集群部署模式下的YARN上的Spark在Spark应用程序是Scala类而不是对象时失败，并且NullPointerException - 让它很好地报告给最终用户。

Digging deeper into how Spark on YARN works, the following message is when the ApplicationMaster of a Spark application starts the driver (you used --deploy-mode cluster --master yarn with spark-submit ). 深入研究YARN上的Spark如何工作，以下消息是当Spark应用程序的ApplicationMaster启动驱动程序时（您使用了--deploy-mode cluster --master yarn with spark-submit ）。

ApplicationMaster: Starting the user application in a separate Thread ApplicationMaster：在单独的线程中启动用户应用程序

Right after the INFO message you should see another: 在INFO消息之后，您应该看到另一个消息：

ApplicationMaster: Waiting for spark context initialization... ApplicationMaster：等待spark上下文初始化...

This is part of the driver initialization when the ApplicationMaster runs . 这是ApplicationMaster运行时驱动程序初始化的一部分。

The reason for the exception Exception in thread "Driver" java.lang.NullPointerException is due to the following code : Exception in thread "Driver" java.lang.NullPointerException异常Exception in thread "Driver" java.lang.NullPointerException的原因是由于以下代码：

val mainMethod = userClassLoader.loadClass(args.userClass)
  .getMethod("main", classOf[Array[String]])

My understanding is that mainMethod is null at this point so the following line (where mainMethod is null ) "triggers" NullPointerException : 我的理解是mainMethod为null ，因此以下行（其中mainMethod为null ）“触发” NullPointerException ：

mainMethod.invoke(null, userArgs.toArray)

The thread is indeed called Driver (as in Exception in thread "Driver" java.lang.NullPointerException ) as set in this line : 该线程确实称为Driver （如在Exception in thread "Driver" java.lang.NullPointerException中的Exception in thread "Driver" java.lang.NullPointerException ），如下所示：

userThread.setContextClassLoader(userClassLoader)
userThread.setName("Driver")
userThread.start()

The line numbers differ since I used Spark 2.3.0 to reference the lines while you use emr-5.4.0 with Spark 2.1.0. 行号不同，因为我使用Spark 2.3.0来引用行，而使用emr-5.4.0和Spark 2.1.0。

为什么集群模式下的YARN上的Spark会因“线程中的异常”驱动程序“java.lang.NullPointerException”而失败？

问题描述

1 个解决方案

解决方案1
8 已采纳 2018-03-30 09:40:17

为什么集群模式下的YARN上的Spark会因“线程中的异常”驱动程序“java.lang.NullPointerException”而失败？

问题描述

1 个解决方案

解决方案1 8 已采纳 2018-03-30 09:40:17

解决方案1
8 已采纳 2018-03-30 09:40:17