How does Spark know where the Yarn Resource Manager is running when not using spark-submit.sh?

Question

I am quite new to Spark and I am trying to start a Spark job from inside my application (without using spark-submit.sh) in yarn-cluster mode and I am trying to figure out how the job gets to know where the Yarn ResourceManager is running. I have done

SparkConf sConf = new SparkConf().setMaster("yarn-cluster").set("spark.driver.memory", "10g");

But what I am not able to configure is the location of the Yarn ResourceManager. Any ideas on how I go about doing it? I have a clustered setup where the Yarn RM does not run on the same machine as the application.

Answer 1

Look into Spark Launcher API - org.apache.spark.launcher Java Doc
Or read about it here - SparkLauncher — Launching Spark Applications

Answer 2

The properties can be found in yarn-site.xml either located in your HADOOP_CONF_DIR or YARN_CONF_DIR environment variables, which are either set at the OS level, or in spark-env.sh .

In a non-HA deployment, you are looking for yarn.resourcemanager.address

How does Spark know where the Yarn Resource Manager is running when not using spark-submit.sh?

Question

2 answers

solution1
1 2017-04-12 13:59:48

solution2
0 2017-09-05 17:04:50

How does Spark know where the Yarn Resource Manager is running when not using spark-submit.sh?

Question

2 answers

solution1 1 2017-04-12 13:59:48

solution2 0 2017-09-05 17:04:50

solution1
1 2017-04-12 13:59:48

solution2
0 2017-09-05 17:04:50