簡體   English   中英

Spark 作業容器以 exitCode 退出:-1000

[英]Spark Job Container exited with exitCode: -1000

我一直在努力在紗線集群模式下使用 spark 2.0.0 運行示例作業,作業存在 exitCode: -1000 沒有任何其他線索。 相同的作業在本地模式下正常運行。

火花命令:

spark-submit \
--conf "spark.yarn.stagingDir=/xyz/warehouse/spark" \
--queue xyz \
--class com.xyz.TestJob \
--master yarn \
--deploy-mode cluster \
--conf "spark.local.dir=/xyz/warehouse/tmp" \
/xyzpath/java-test-1.0-SNAPSHOT.jar $@

測試作業類:

public class TestJob {
    public static void main(String[] args) throws InterruptedException {
        SparkConf conf = new SparkConf();
        JavaSparkContext jsc = new JavaSparkContext(conf);
        System.out.println(
                "TOtal count:"+
                        jsc.parallelize(Arrays.asList(new Integer[]{1,2,3,4})).count());
        jsc.stop();
    }
}

錯誤日志:

17/10/04 22:26:52 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED)
17/10/04 22:26:52 INFO Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.xyz
         start time: 1507181210893
         final status: UNDEFINED
         tracking URL: http://xyzserver:8088/proxy/application_1506717704791_130756/
         user: xyz
17/10/04 22:26:53 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED)
17/10/04 22:26:54 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED)
17/10/04 22:26:55 INFO Client: Application report for application_1506717704791_130756 (state: ACCEPTED)
17/10/04 22:26:56 INFO Client: Application report for application_1506717704791_130756 (state: FAILED)
17/10/04 22:26:56 INFO Client:
         client token: N/A
         diagnostics: Application application_1506717704791_130756 failed 5 times due to AM Container for appattempt_1506717704791_130756_000005 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://xyzserver:8088/cluster/app/application_1506717704791_130756Then, click on links to logs of each attempt.
Diagnostics: Failing this attempt. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.xyz
         start time: 1507181210893
         final status: FAILED
         tracking URL: http://xyzserver:8088/cluster/app/application_1506717704791_130756
         user: xyz
17/10/04 22:26:56 INFO Client: Deleted staging directory /xyz/spark/.sparkStaging/application_1506717704791_130756
Exception in thread "main" org.apache.spark.SparkException: Application application_1506717704791_130756 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1213)

當我瀏覽頁面http://xyzserver:8088/cluster/app/application_1506717704791_130756它不存在。

未找到 Yarn 應用程序日志-

$yarn logs -applicationId application_1506717704791_130756 
/apps/yarn/logs/xyz/logs/application_1506717704791_130756 does not have any log files.

此錯誤的可能根本原因是什么以及如何獲取詳細的錯誤日志?

花了將近一整天后,我找到了根本原因。 當我刪除spark.yarn.stagingDir它開始工作,但我仍然不確定為什么 spark 抱怨它-

以前的 Spark 提交-

spark-submit \
--conf "spark.yarn.stagingDir=/xyz/warehouse/spark" \
--queue xyz \
--class com.xyz.TestJob \
--master yarn \
--deploy-mode cluster \
--conf "spark.local.dir=/xyz/warehouse/tmp" \
/xyzpath/java-test-1.0-SNAPSHOT.jar $@

新的-

spark-submit \
--queue xyz \
--class com.xyz.TestJob \
--master yarn \
--deploy-mode cluster \
--conf "spark.local.dir=/xyz/warehouse/tmp" \
/xyzpath/java-test-1.0-SNAPSHOT.jar $@

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM