简体   繁体   中英

How to calculate the Executor memory,No of executor ,No of executor cores and Driver memory to read a file of 40GB using Spark?

Yarn Cluster Configuration: 8 Nodes 8 cores per Node 8 GB RAM per Node 1TB HardDisk per Node

Executor memory & No of Executors

Executor memory and no of executors/node are interlinked so you would first start selecting Executor memory or No of executors then based on your choice you can follow this to set properties to get desired results

In YARN these properties would affect number of containers (/executors in Spark) that can be instantiated in a NodeManager based on spark.executor.cores, spark.executor.memory property values (along with executor memory overhead)

For example, if a cluster with 10 nodes (RAM: 16 GB, cores: 6) and set with following yarn properties

yarn.scheduler.maximum-allocation-mb=10GB 
yarn.nodemanager.resource.memory-mb=10GB
yarn.scheduler.maximum-allocation-vcores=4
yarn.nodemanager.resource.cpu-vcores=4

Then with spark properties spark.executor.cores=2, spark.executor.memory=4GB you can expect 2 Executors/Node so total you'll get 19 executors + 1 container for Driver

If the spark properties are spark.executor.cores=3, spark.executor.memory=8GB then you will get 9 Executor (only 1 Executor/Node) + 1 container for Driver link

Driver memory

spark.driver.memory —Maximum size of each Spark driver's Java heap memory

spark.yarn.driver.memoryOverhead —Amount of extra off-heap memory that can be requested from YARN, per driver. This, together with spark.driver.memory, is the total memory that YARN can use to create a JVM for a driver process.

Spark driver memory does not impact performance directly , but it ensures that the Spark jobs run without memory constraints at the driver . Adjust the total amount of memory allocated to a Spark driver by using the following formula, assuming the value of yarn.nodemanager.resource.memory-mb is X:

  • 12 GB when X is greater than 50 GB
  • 4 GB when X is between 12 GB and 50 GB
  • 1 GB when X is between 1GB and 12 GB
  • 256 MB when X is less than 1 GB

These numbers are for the sum of spark.driver.memory and spark.yarn.driver.memoryOverhead . Overhead should be 10-15% of the total.

You can also follow this Cloudera link for tuning Spark jobs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM