简体   繁体   English

Spark作业提交:AWS EMR步骤或命令行spark-submit

[英]Spark Job Submission: AWS EMR step or command line spark-submit

I am running an AWS EMR cluster using yarn as master and cluster deploy mode. 我正在使用yarn作为主集群部署模式运行AWS EMR集群。 All of the tutorials I read runs spark-submit using AWS CLI in so called "Spark Steps" using a command similar to the following: 我阅读的所有教程都使用AWS CLI在所谓的“Spark Steps”中运行spark-submit,使用类似于以下的命令:

aws emr add-steps --cluster-id j-2AXXXXXXGAPLF --steps Type=Spark,Name="Spark Program",ActionOnFailure=CONTINUE,Args=[--class,org.apache.spark.examples.SparkPi,/usr/lib/spark/lib/spark-examples.jar,10]

My professor recommends I submit my spark applications by moving files to master node via SCP, then running the application via SSH: 我的教授建议我通过SCP将文件移动到主节点,然后通过SSH运行应用程序来提交我的spark应用程序:

ssh hadoop@ec2-xx-xxx-xxx-xx.compute-1.amazonaws.com

Then I would put the data files into HDFS via the shell. 然后我会通过shell将数据文件放入HDFS。 Then finally I would simply run spark-submit: 最后我会简单地运行spark-submit:

spark-submit --master yarn --deploy-mode cluster my_spark_app.py my_hdfs_file.csv

What is the difference between submitting a "Spark Step" through AWS CLI versus running spark-submit via SSH into a master node? 通过AWS CLI提交“Spark Step”与通过SSH将spark-submit运行到主节点之间有什么区别? Will my Spark application still run in a distributed fashion by submitting the jobs from the master node? 我的Spark应用程序是否仍然通过从主节点提交作业以分布式方式运行?

Submitting an EMR step is using Amazon's custom built step submission process which is a relatively light wrapper abstraction which itself calls spark-submit. 提交EMR步骤是使用亚马逊的自定义构建步骤提交过程,这是一个相对较轻的包装器抽象,它本身称为spark-submit。 Fundamentally, there is little difference, but if you wish to be platform agnostic (re not locked in to Amazon), use the SSH strategy or try even more advanced submission strategies like remote submission or one of my favorites, using Livy. 从根本上说,没有什么区别,但如果您希望与平台无关(不要锁定到亚马逊),请使用SSH策略或使用Livy尝试更高级的提交策略,如远程提交或我最喜欢的一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM