简体   繁体   English

在YARN上运行Spark应用程序,没有spark-submit

[英]Running a Spark application on YARN, without spark-submit

I know that Spark applications can be executed on YARN using spark-submit --master yarn . 我知道可以使用spark-submit --master yarn在YARN上执行Spark应用程序。

The question is: is it possible to run a Spark application on yarn using the yarn command ? 问题是:是否可以使用yarn命令在纱线上运行Spark应用程序?

If so, the YARN REST API could be used as interface for running spark and MapReduce applications in a uniform way. 如果是这样,YARN REST API可以用作以统一方式运行spark和MapReduce应用程序的接口。

Just like all YARN Applications, Spark implements a Client and an ApplicationMaster when deploying on YARN. 与所有YARN应用程序一样,Spark在YARN上部署时实现客户端和ApplicationMaster。 If you look at the implementation in the Spark repository, you'll have a clue as to how to create your own Client/ApplicationMaster : https://github.com/apache/spark/tree/master/yarn/src/main/scala/org/apache/spark/deploy/yarn . 如果您查看Spark存储库中的实现,您将获得有关如何创建自己的Client / ApplicationMaster的线索: https//github.com/apache/spark/tree/master/yarn/src/main/ scala / org / apache / spark / deploy / yarn But out of the box it does not seem possible. 但开箱即用它似乎不可能。

I see this question is a year old, but to anyone else who stumbles across this question it looks like this should be possible now. 我看到这个问题已经有一年了,但是对于那些偶然发现这个问题的人来说,现在看起来应该是可能的。 I've been trying to do something similar and have been attempting to follow the Starting Spark jobs directly via YARN REST API Tutorial from Hortonworks. 我一直在尝试做类似的事情,并试图通过 Hortonworks的YARN REST API Tutorial直接跟踪Starting Spark工作

Essentially what you need to do is upload your jar to HDFS, create a Spark Job JSON file per the YARN REST API Documentation, and then use a curl command to start the application. 基本上您需要做的是将jar上传到HDFS,根据YARN REST API文档创建Spark Job JSON文件,然后使用curl命令启动应用程序。 An example of that command is: 该命令的一个示例是:

curl -s -i -X POST -H "Content-Type: application/json" ${HADOOP_RM}/ws/v1/cluster/apps \
     --data-binary spark-yarn.json 

I have not seen the lates package, but few months back such thing was not possible "out of the box" (this is info straight from cloudera support). 我还没有看到lates包,但几个月前这样的事情不可能“开箱即用”(这是直接来自cloudera支持的信息)。 I know it's not what you were hoping for, but that's what I know. 我知道这不是你所希望的,但这就是我所知道的。

Thanks for the question. 谢谢你的提问。 As suggested above the AM is a good route to write and submit one's application without invoking spark-submit. 如上所述,AM是一个很好的途径,可以在不调用spark-submit的情况下编写和提交应用程序。 The community has built around the spark-submit command for YARN with the addition of flags that ease the addition of jars and/or configs etc. that are needed to get the application to execute successfully. 社区围绕着YARN的spark-submit命令构建,添加了标记,以便于添加成功执行应用程序所需的jar和/或configs等。 Submitting Applications 提交申请

An alternate solution(could try): You could have the spark job as an action in an Oozie workflow. 另一种解决方案(可以尝试):您可以将火花作业作为Oozie工作流程中的操作。 Oozie Spark Extension Depending on what you wish to achieve, either route looks good. Oozie Spark扩展根据您希望实现的目标,两种路径看起来都很好。 Hope it helps. 希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM