简体   繁体   中英

where to find dataproc master node address and set setMaster() in pyspark job?

I am running pyspark job on dataproc cluster. Its running fine,if I dont set Master. But,I am wondering, how we can set Master. I am not getting address of url of master node. I just tried to copy Master Nodes Compute Engine Ip address and setMaster('spark://<MASTER_COMPUTE_ENG_ADRESS>:7077') but its throwing error.

Can someone tell me, where I can find Master node url on GCP dataproc and how to set master in Pyspark job?

Dataproc uses YARN as resource manager and by default runs Spark jobs on YARN 1 . In Spark config, spark.master is set to yarn , so Spark can automatically find the YARN address from YARN config /etc/hadoop/conf/yarn-site.xml .

In generally, you should not set master explicitly on Dataproc unless you want your job to run outside of YARN. In this case, you need to first start Spark master and workers manually to run Spark in standalone mode 2 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM