简体繁体中英

How to deal with executor memory and driver memory in Spark?

原文 2014-11-28 04:07:00 9 3 memory-management/ apache-spark

I am confused about dealing with executor memory and driver memory in Spark.

My environment settings are as below:

Memory 128 G, 16 CPU for 9 VM
Centos
Hadoop 2.5.0-cdh5.2.0
Spark 1.1.0

Input data information:

3.5 GB data file from HDFS

For simple development, I executed my Python code in standalone cluster mode (8 workers, 20 cores, 45.3 G memory) with spark-submit . Now I would like to set executor memory or driver memory for performance tuning.

From the Spark documentation , the definition for executor memory is

Amount of memory to use per executor process, in the same format as JVM memory strings (eg 512m, 2g).

How about driver memory?

3 answers

The memory you need to assign to the driver depends on the job.

If the job is based purely on transformations and terminates on some distributed output action like rdd.saveAsTextFile, rdd.saveToCassandra, ... then the memory needs of the driver will be very low. Few 100's of MB will do. The driver is also responsible of delivering files and collecting metrics, but not be involved in data processing.

If the job requires the driver to participate in the computation , like eg some ML algo that needs to materialize results and broadcast them on the next iteration, then your job becomes dependent of the amount of data passing through the driver. Operations like .collect , .take and takeSample deliver data to the driver and hence, the driver needs enough memory to allocate such data.

eg If you have an rdd of 3GB in the cluster and call val myresultArray = rdd.collect , then you will need 3GB of memory in the driver to hold that data plus some extra room for the functions mentioned in the first paragraph.

In a Spark Application, Driver is responsible for task scheduling and Executor is responsible for executing the concrete tasks in your job.

If you are familiar with MapReduce, your map tasks & reduce tasks are all executed in Executor(in Spark, they are called ShuffleMapTasks & ResultTasks), and also, whatever RDD you want to cache is also in executor's JVM's heap & disk.

So I think a few GBs will just be OK for your Driver.

Spark shell required memory = (Driver Memory + 384 MB) + (Number of executors * (Executor memory + 384 MB))

Here 384 MB is maximum memory (overhead) value that may be utilized by Spark when executing jobs.

Apache Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs

Spark Driver Memory calculation

Spark Driver is not releasing memory

Analyzing spark executor memory dump { After few days yarn container runs of out memory}

How to deal with Ruby 2.1.2 memory leaks?

How to deal with a property memory management in IOS?

How to deal with low memory warnings on the iPhone?

How can I deal with a Subview memory leak?

How to deal with Android memory management and onTrimMemory callback?

How to deal with GPU memory leakage issues in Torch?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Apache Spark Effects of Driver Memory, Executor Memory, Driver Memory Overhead and Executor Memory Overhead on success of job runs Spark Driver Memory calculation Spark Driver is not releasing memory Analyzing spark executor memory dump { After few days yarn container runs of out memory} How to deal with Ruby 2.1.2 memory leaks? How to deal with a property memory management in IOS? How to deal with low memory warnings on the iPhone? How can I deal with a Subview memory leak? How to deal with Android memory management and onTrimMemory callback? How to deal with GPU memory leakage issues in Torch?

Related Tags

How to deal with executor memory and driver memory in Spark?

Question

3 answers

solution1
103 ACCPTED 2014-11-28 14:35:05

solution2
6 2014-11-28 07:42:34

solution3
0 2021-03-23 07:39:03

How to deal with executor memory and driver memory in Spark?

Question

3 answers

solution1 103 ACCPTED 2014-11-28 14:35:05

solution2 6 2014-11-28 07:42:34

solution3 0 2021-03-23 07:39:03

solution1
103 ACCPTED 2014-11-28 14:35:05

solution2
6 2014-11-28 07:42:34

solution3
0 2021-03-23 07:39:03