Spark客户端永远不会以纱线群集模式运行

Question

we are experiencing a weird problem with Spark 1.6.2. 我们遇到了Spark 1.6.2的奇怪问题。 We're submitting out Spark applications in clsuter mode.Everything is fine, but sometimes the client process which launched the application happen to hang up. 我们正在以clsuter模式提交Spark应用程序。一切都很好，但有时启动应用程序的客户端进程会挂断。 And the only way to unlock it is to inspect its sterr: then it finishes. 解锁它的唯一方法是检查它的sterr：然后它完成。 I try to explain what I mean with an example. 我试着通过一个例子来解释我的意思。

We are on the edge node of our cluster and we run: 我们在集群的边缘节点上运行：

spark-submit --master yarn-cluster ... &

It turns out that the client process pid is 12435. Then, the Spark application runs and finishes (we can see it from yarn ot the Spark UI). 事实证明客户端进程pid是12435.然后，Spark应用程序运行并完成（我们可以从Spark UI中看到它）。 Nonetheless, on the edge node the process 12435 stay alive and never ends. 尽管如此，在边缘节点上，进程12435保持活动并且永远不会结束。 Then, we try to inspect its output from /proc/12435/fd/2. 然后，我们尝试从/ proc / 12435 / fd / 2检查其输出。 When we do that, the process ends. 当我们这样做时，过程结束。

I can't understand what is happening and how to fix it. 我无法理解发生了什么以及如何解决它。 Does anybody have an idea? 有人有想法吗？

Thank you, Marco 谢谢你，马可

Answer 1

This has got nothing to do with spark. 这与火花无关。

It is a shell issue. 这是一个shell问题。 You are forgetting to redirect error log to any place. 您忘记将错误日志重定向到任何地方。

There are two output streams of any command, stdout and stderr and you should provide both of them when starting a background job. 任何命令有两个输出流，stdout和stderr，你应该在启动后台作业时提供它们。

If you want to redirect both output to same file. 如果要将两个输出重定向到同一文件。

spark-submit --master yarn-cluster ...  > ~/output.txt 2>&1 &

If you want error in one and output log in other 如果您想在一个中输入错误而在另一个中输出日志

spark-submit --master yarn-cluster ...  > ~/output.txt 2>~/error.txt &

Spark客户端永远不会以纱线群集模式运行

问题描述

1 个解决方案

解决方案1
0 2016-12-09 17:33:55

Spark客户端永远不会以纱线群集模式运行

问题描述

1 个解决方案

解决方案1 0 2016-12-09 17:33:55

解决方案1
0 2016-12-09 17:33:55