简体   繁体   English

无法使用集群模式在EMR上运行Spark应用程序

[英]Unable to run spark application on EMR using cluster mode

I have a spark application, which I am trying to run on amazon EMR. 我有一个Spark应用程序,试图在Amazon EMR上运行。 But my application fails or goes to running mode and never quits, The same code is working on local machine in 2-3 mins. 但是我的应用程序失败或进入运行模式,并且永不退出,同一代码在2-3分钟内在本地计算机上运行。 I suspect some issue with the way I'm creating spark session, My master conf is below 我怀疑我创建Spark会话的方式存在问题,下面是我的主会话

val spark = SparkSession.builder
  .master("local[2]")
  .appName("Graph Creation")
  .config("spark.sql.warehouse.dir", "warehouse")
  .config("spark.sql.shuffle.partitions", "1")
  .getOrCreate()

How can I build spark session so that it runs both on my local machine as well amazon EMR without issue 我如何建立Spark会话,以便它在我的本地计算机以及亚马逊EMR上都可以正常运行

It's better not to use local master URL in EMR cluster since you won't benefit from using slave nodes. 最好不要在EMR群集中使用local主URL,因为您不会从使用从属节点中受益。 Local means that spark will run locally on the system where it is launched and won't try to use other nodes in the cluster. 本地意味着spark将在启动它的系统上本地运行,并且不会尝试使用集群中的其他节点。 The main purpose of local is local testing and whenever you want to run in a cluster you should choose a resource manager (yarn, mesos, spark-standalone or Kubernetes cluster, see here for more details). 的主要目的local是本地测试,只要你想在集群中运行,你应该选择一个资源管理器(纱线,mesos,火花独立或Kubernetes集群,请点击这里了解详细信息)。

You can provide the master URL as argument to spark-submit command so that if you run it locally you pass 'local' and for EMR cluster pass 'yarn', for example. 您可以提供主URL作为spark-submit命令的参数,这样,如果在本地运行它,则可以传递“ local”,而对于EMR集群,则传递“ yarn”。

val spark = SparkSession.builder
  .appName("Graph Creation")
  .config("spark.sql.warehouse.dir", "warehouse")
  .config("spark.sql.shuffle.partitions", "1")
  .getOrCreate()

And then locally: 然后在本地:

./bin/spark-submit --master local[2] ...

On EMR: 在EMR上:

./bin/spark-submit --master yarn ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM