简体   繁体   中英

How does Spark know where the Yarn Resource Manager is running when not using spark-submit.sh?

I am quite new to Spark and I am trying to start a Spark job from inside my application (without using spark-submit.sh) in yarn-cluster mode and I am trying to figure out how the job gets to know where the Yarn ResourceManager is running. I have done

SparkConf sConf = new SparkConf().setMaster("yarn-cluster").set("spark.driver.memory", "10g");

But what I am not able to configure is the location of the Yarn ResourceManager. Any ideas on how I go about doing it? I have a clustered setup where the Yarn RM does not run on the same machine as the application.

Look into Spark Launcher API - org.apache.spark.launcher Java Doc
Or read about it here - SparkLauncher — Launching Spark Applications

The properties can be found in yarn-site.xml either located in your HADOOP_CONF_DIR or YARN_CONF_DIR environment variables, which are either set at the OS level, or in spark-env.sh .

In a non-HA deployment, you are looking for yarn.resourcemanager.address

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM