简体   繁体   中英

Spark: driver/worker configuration. Does driver run on Master node?

I am starting a spark cluster on AWS, with one master and 60 cores:

在此输入图像描述

Here is the command to start up, basically 2 executors per core, totally 120 executors:

spark-submit --deploy-mode cluster --master yarn-cluster --driver-memory 180g --driver-cores 26 --executor-memory 90g --executor-cores 13 --num-executors 120

However, in the job tracker, there is only 119 executors:

在此输入图像描述

I thought there should be 1 driver + 120 worker executors. However, what I saw was 119 executors, which including 1 driver + 118 working executors.

Does that mean my Master node was not used? Is the driver running on the Master node or Core node? Can I make the driver run on the Master node and let the 60 Cores hosting 120 working executors?

Thanks!

In yarn-cluster mode, the driver runs in the Application Master. This means that the same process is responsible for both driving the application and requesting resources from YARN, and this process runs inside a YARN container. The client that starts the app doesn't need to stick around for its entire lifetime.

在此输入图像描述

In yarn-client mode, Spark driver to run inside the client process that initiates the Spark application.

在此输入图像描述

Have a look at cloudera blog for more details.

When you're running yarn-cluster mode, the driver of the application runs within the cluster, rather than on the machine which you ran spark submit. This means that it will take up the number of driver cores on the cluster, resulting in the 119 executors that you see.

If you want to run your driver outside of the cluster, try yarn-client mode.

More details about running on YARN can be found here: http://spark.apache.org/docs/latest/running-on-yarn.html

By using the cluster-mode , the resource allocation has the structure shown in the following diagram.

在此输入图像描述

I will attempt to provide an illustration of the calculations for the resources allocation as made by YARN. First of all, the specs of each of the core nodes are the following (you can confirm here ):

  • memory: 244 GB
  • cores/vCPUs: 32

This means that you can run at maximum:

  • 2 executors per core node , which is calculated based on the memory and cores requested. Specifically, available_cores / requested_cores = 32 / 13 = 2.46 -> 2 & available_mem / requested_mem = 244 / 90 = 2.71 = 2 .
  • a single driver, without any more executors in a single core node . This is because when a driver runs in a core node, it leaves 244 - 180 = 64 GB of memory and 32 and 32-26 = 6 cores/vCPUS, which are not enough to run a separate executor.

So, from the existing pool of 60 core nodes, 1 node is used for the driver, leaving 59 remaining core nodes, which are running 59*2 = 118 executors.

Does that mean my Master node was not used?

If you mean whether the master node was used in order to execute the driver, then the answer is no . However, note that master was probably running a bunch of other applications in the meanwhile, which are out-of-scope in the context of this discussion (eg YARN resource manager, HDFS namenode etc.).

Is the driver running on the Master node or Core node?

The latter, the driver is running on the core node (since you used the --deploy-mode cluster parameter).

Can I make the driver run on the Master node and let the 60 Cores hosting 120 working executors?

Yes! The way to do that is to execute the same command but with --deploy-mode client (or leave that parameter unspecified, since at the time of writing this is used as default by Spark) in the master node .

By doing that, the resource allocation will have the structure shown in the following diagram.

在此输入图像描述

Note that the Application Master will still consume some resources from the cluster ("stealing some resources from the executors). However, the AM resources are by default minimal, as can be seen here ( spark.yarn.am.memory and spark.yarn.am.cores options), so it should not have a big impact.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM