简体   繁体   English

如何远程提交Spark作业

[英]How to submit spark job Remotely

I have Nodejs server where spark is NOT installed and have spark with YARN setup in different server. 我有未安装spark的Node.js服务器,并且在其他服务器中使用YARN设置有spark。

Requirement is to run spark job from Nodejs server remotely. 要求是从Nodejs服务器远程运行spark作业。 Can someone help on this?. 有人可以帮忙吗?

Thanks for prompt response. 感谢您的及时答复。

It's not possible to spawn process remotely. 不可能远程生成进程。 I suggest you the following ways, 我建议您通过以下方式,

  1. Install spark where your Node server is running, and use this as client to point to your actual spark cluster. 在运行Node服务器的位置安装spark,并将其用作客户端以指向实际的spark集群。 Your node server can use this client to trigger the job in client mode on the remote cluster. 您的节点服务器可以使用此客户端在远程集群上以客户端模式触发作业。
  2. You can setup a rest api on the spark cluster and let your node server hit an endpoint of this api which will trigger the job. 您可以在spark集群上设置rest api,并让您的节点服务器点击该api的端点,这将触发作业。

Elaborating the above answers, option 1 involves that spark is installed on both systems - the one with node server and the actual spark cluster. 详细说明以上答案,选项1涉及在两个系统上都安装了spark-一个带有节点服务器和实际spark集群的系统。 Spark on node server acts a client to the main spark cluster. 节点服务器上的Spark充当主Spark集群的客户端。 Option 2 focuses on creating a rest API that handles triggers and these triggers initiate the spark job directly on the main cluster, this saves 2 installations. 选项2专注于创建一个处理触发器的rest API,这些触发器直接在主集群上启动spark作业,这节省了2次安装。

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM