简体   繁体   English

Spark 在 Yarn 集群 exitCode=13 上运行:

[英]Spark runs on Yarn cluster exitCode=13:

I am a spark/yarn newbie, run into exitCode=13 when I submit a spark job on yarn cluster.我是 spark/yarn 新手,当我在 yarn 集群上提交 spark 作业时遇到了 exitCode=13。 When the spark job is running in local mode, everything is fine.当 spark 作业在本地模式下运行时,一切都很好。

The command I used is:我使用的命令是:

/usr/hdp/current/spark-client/bin/spark-submit --class com.test.sparkTest --master yarn --deploy-mode cluster --num-executors 40 --executor-cores 4 --driver-memory 17g --executor-memory 22g --files /usr/hdp/current/spark-client/conf/hive-site.xml /home/user/sparkTest.jar*

Spark Error Log:火花错误日志:

16/04/12 17:59:30 INFO Client:
         client token: N/A
         diagnostics: Application application_1459460037715_23007 failed 2 times due to AM Container for appattempt_1459460037715_23007_000002 exited with  exitCode: 13
For more detailed output, check application tracking page:http://b-r06f2-prod.phx2.cpe.net:8088/cluster/app/application_1459460037715_23007Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e40_1459460037715_23007_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
        at org.apache.hadoop.util.Shell.run(Shell.java:487)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)


**Yarn logs**

    16/04/12 23:55:35 INFO mapreduce.TableInputFormatBase: Input split length: 977 M bytes.
16/04/12 23:55:41 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:55:51 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:56:01 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:56:11 INFO yarn.ApplicationMaster: Waiting for spark context initialization ...
16/04/12 23:56:11 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x152f0b4fc0e7488
16/04/12 23:56:11 INFO zookeeper.ZooKeeper: Session: 0x152f0b4fc0e7488 closed
16/04/12 23:56:11 INFO zookeeper.ClientCnxn: EventThread shut down
16/04/12 23:56:11 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 2). 2003 bytes result sent to driver
16/04/12 23:56:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 82134 ms on localhost (2/3)
16/04/12 23:56:17 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x4508c270df0980316/04/12 23:56:17 INFO zookeeper.ZooKeeper: Session: 0x4508c270df09803 closed *
...
    16/04/12 23:56:21 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
16/04/12 23:56:21 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Timed out waiting for SparkContext.)
16/04/12 23:56:21 INFO spark.SparkContext: Invoking stop() from shutdown hook *

It seems that you have set the master in your code to be local似乎您已将代码中的 master 设置为本地

SparkConf.setMaster("local[*]")

You have to let the master unset in the code, and set it later when you issue spark-submit你必须让 master 在代码中取消设置,然后在你发出spark-submit时设置它

spark-submit --master yarn-client ...

If it helps someone如果它帮助某人

Another possibility of this error is when you put incorrectly the --class param此错误的另一种可能性是当您错误地放置--class参数时

I had exactly the same problem but the above answer didn't work.我遇到了完全相同的问题,但上述答案不起作用。 Alternatively, when I ran this with spark-submit --deploy-mode client everything worked fine.或者,当我使用spark-submit --deploy-mode client运行它时,一切正常。

I got this same error running a SparkSQL job in cluster mode.我在集群模式下运行 SparkSQL 作业时遇到了同样的错误。 None of the other solutions worked for me but looking in the job history server logs in Hadoop I found this stack trace.其他解决方案都不适合我,但查看 Hadoop 中的作业历史服务器日志,我发现了此堆栈跟踪。

20/02/05 23:01:24 INFO hive.metastore: Connected to metastore.
20/02/05 23:03:03 ERROR yarn.ApplicationMaster: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468)
    at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
...


and looking at the Spark source code you'll find that basically the AM timed out waiting for the spark.driver.port property to be set by the Thread executing the user class.查看Spark 源代码,您会发现基本上 AM 超时,等待执行用户类的线程设置spark.driver.port属性。
So it could either be a transient issue or you should investigate your code for the reason for a timeout.因此,这可能是一个暂时性问题,或者您应该调查您的代码以了解超时的原因。

This exit code 13 is a tricky one...这个退出代码 13 是一个棘手的问题......

For me it was SyntaxError: invalid syntax that was in one of the scripts imports downstream to the spark-submit call.对我来说,这是SyntaxError: invalid syntax是在一个脚本导入下游到spark-submit调用。

When debugging this on aws, if the spark-submit was not initialized properly, you will not find the error on Spark History Server, you will have to find it on the Spark logs: EMR UI Console -> Summary -> Log URI -> Containers -> application_xxx_xxx -> container_yyy_yy_yy -> stdout.gz.在 aws 上调试时,如果 spark-submit 未正确初始化,您将不会在 Spark History Server 上找到错误,您必须在 Spark 日志中找到它:EMR UI Console -> Summary -> Log URI ->容器 -> application_xxx_xxx -> container_yyy_yy_yy -> stdout.gz。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM