I am quite new to Spark and I am trying to start a Spark job from inside my application (without using spark-submit.sh) in yarn-cluster mode and I am trying to figure out how the job gets to know where the Yarn ResourceManager is running. I have done
SparkConf sConf = new SparkConf().setMaster("yarn-cluster").set("spark.driver.memory", "10g");
But what I am not able to configure is the location of the Yarn ResourceManager. Any ideas on how I go about doing it? I have a clustered setup where the Yarn RM does not run on the same machine as the application.
Look into Spark Launcher API - org.apache.spark.launcher Java Doc
Or read about it here - SparkLauncher — Launching Spark Applications
The properties can be found in yarn-site.xml
either located in your HADOOP_CONF_DIR
or YARN_CONF_DIR
environment variables, which are either set at the OS level, or in spark-env.sh
.
In a non-HA deployment, you are looking for yarn.resourcemanager.address
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.