简体   繁体   English

Spark客户端永远不会以纱线群集模式运行

[英]Spark client never ends running in yarn-cluster mode

we are experiencing a weird problem with Spark 1.6.2. 我们遇到了Spark 1.6.2的奇怪问题。 We're submitting out Spark applications in clsuter mode.Everything is fine, but sometimes the client process which launched the application happen to hang up. 我们正在以clsuter模式提交Spark应用程序。一切都很好,但有时启动应用程序的客户端进程会挂断。 And the only way to unlock it is to inspect its sterr: then it finishes. 解锁它的唯一方法是检查它的sterr:然后它完成。 I try to explain what I mean with an example. 我试着通过一个例子来解释我的意思。

We are on the edge node of our cluster and we run: 我们在集群的边缘节点上运行:

spark-submit --master yarn-cluster ... &

It turns out that the client process pid is 12435. Then, the Spark application runs and finishes (we can see it from yarn ot the Spark UI). 事实证明客户端进程pid是12435.然后,Spark应用程序运行并完成(我们可以从Spark UI中看到它)。 Nonetheless, on the edge node the process 12435 stay alive and never ends. 尽管如此,在边缘节点上,进程12435保持活动并且永远不会结束。 Then, we try to inspect its output from /proc/12435/fd/2. 然后,我们尝试从/ proc / 12435 / fd / 2检查其输出。 When we do that, the process ends. 当我们这样做时,过程结束。

I can't understand what is happening and how to fix it. 我无法理解发生了什么以及如何解决它。 Does anybody have an idea? 有人有想法吗?

Thank you, Marco 谢谢你,马可

This has got nothing to do with spark. 这与火花无关。

It is a shell issue. 这是一个shell问题。 You are forgetting to redirect error log to any place. 您忘记将错误日志重定向到任何地方。

There are two output streams of any command, stdout and stderr and you should provide both of them when starting a background job. 任何命令有两个输出流,stdout和stderr,你应该在启动后台作业时提供它们。

If you want to redirect both output to same file. 如果要将两个输出重定向到同一文件。

spark-submit --master yarn-cluster ...  > ~/output.txt 2>&1 &

If you want error in one and output log in other 如果您想在一个中输入错误而在另一个中输出日志

spark-submit --master yarn-cluster ...  > ~/output.txt 2>~/error.txt &

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在纱线群集模式下,Spark作业失败 - Spark job fails in yarn-cluster mode 纱线群集模式下Spark作业的ClassNotFoundException - ClassNotFoundException for Spark job on Yarn-cluster mode Spark-submit / spark-shell > yarn-client 和 yarn-cluster 模式的区别 - Spark-submit / spark-shell > difference between yarn-client and yarn-cluster mode 在纱线群集模式下在YARN上运行Spark:控制台输出在哪里? - Running Spark on YARN in yarn-cluster mode: Where does the console output go? 纱线客户模式和纱线集群模式之间的差异 - Difference between yarn-client mode and yarn-cluster mode 如何将配置文件添加到在 YARN-CLUSTER 模式下运行的 Spark 作业? - How can I add configuration files to a Spark job running in YARN-CLUSTER mode? 如何找出在纱簇模式下以spark模式运行的任务的驱动程序进程节点 - how to find out driver process node for tasks running in spark in yarn-cluster mode 在 yarn-cluster 模式下运行 Spark 时出错(应用程序返回退出代码 1) - Error (application returned with exitcode 1) when running Spark in yarn-cluster mode 在纱线群集模式下的Apache Spark正在抛出Hadoop FileAlreadyExistsException - Apache Spark in yarn-cluster mode is throwing Hadoop FileAlreadyExistsException Spark yarn-cluster模式-读取通过--files传递的文件 - Spark yarn-cluster mode - read file passed with --files
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM