在YARN上运行Spark作业

Question

I am trying to utilise all the resources which I have on the cluster to run the Spark job. 我正在尝试利用我在群集上拥有的所有资源来运行Spark作业。 I have Cloudera Manager installed on all of the nodes. 我在所有节点上都安装了Cloudera Manager。 This is the command which I use to submit the job. 这是我用来提交作业的命令。

spark-submit --master yarn 
             --deploy-mode cluster 
             file:///[spark python file]
             file://[app argument 1]
             file://[app argument 2]

During the execution I receive following error: 在执行期间，我收到以下错误：

diagnostics: Application application_1450777964379_0027 failed 2 times due to AM Container for appattempt_1450777964379_0027_000002 exited with  exitCode: 1

Any ideas how to fix it will be much appreciated. 任何想法如何解决它将非常感激。

EDIT 1 The machine where Spark is installed is not accessible by WEB UI I tried to download the sources and read little bit more about the exception. 编辑1 WEB UI无法访问安装Spark的机器我尝试下载源代码并阅读有关异常的更多信息。

------------------------------------------------------------
| Job | Description                                        | 
------------------------------------------------------------
| 0   | saveAsTextFile at NativeMethodAccessorImpl.java:-2 | 
------------------------------------------------------------

Answer 1

Taken from here , 取自这里，

If the path starts with file:// or hdfs://, the path becomes ile:// or dfs://. 如果路径以file：//或hdfs：//开头，则路径变为ile：//或dfs：//。 If the path is absolute the first slash is removed. 如果路径是绝对路径，则删除第一个斜杠。

There is no particular reason for it and needs to be fixed. 没有特别的理由，需要修复。

Try using an absolute path instead of file:// 尝试使用绝对路径而不是file://

在YARN上运行Spark作业

问题描述

1 个解决方案

解决方案1
0 2016-01-13 11:16:31

在YARN上运行Spark作业

问题描述

1 个解决方案

解决方案1 0 2016-01-13 11:16:31

解决方案1
0 2016-01-13 11:16:31