简体   繁体   中英

Understanding spark.yarn.executor.memoryOverhead

When I am running a spark application on yarn, with driver and executor memory settings as --driver-memory 4G --executor-memory 2G

Then when I run the application, an exceptions throws complaining that Container killed by YARN for exceeding memory limits. 2.5 GB of 2.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. Container killed by YARN for exceeding memory limits. 2.5 GB of 2.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

What does this 2.5 GB mean here? (overhead memory, executor memory or overhead+executor memory?)I ask so because when I change the the memory settings as:

--driver-memory 4G --executor-memory 4G --conf --driver-memory 4G --conf spark.yarn.executor.memoryOverhead=2048 ,then the exception disappears.

I would ask, although I have boosted the overhead memory to 2G, it is still under 2.5G, why does it work now?

Let us understand how memory is divided among various regions in spark.

  1. Executor MemoryOverhead :

spark.yarn.executor.memoryOverhead = max(384 MB, .07 * spark.executor.memory) . In your first case, memoryOverhead = max(384 MB, 0.07 * 2 GB) = max(384 MB, 143.36 MB) Hence, memoryOverhead = 384 MB is reserved in each executer assuming you have assigned single core per executer.

  1. Execution and Storage Memory :

By default spark.memory.fraction = 0.6 , which implies that execution and storage as a unified region occupy 60% of the remaining memory ie 998 MB . There is no strict boundary that is allocated to each region unless you enable spark.memory.useLegacyMode . Otherwise they share a moving boundary.

  1. User Memory :

Memory pool that remains after the allocation of Execution and Storage Memory, and it is completely up to you to use it in a way you like. You can store your own data structures there that would be used in RDD transformations. For example, you can rewrite Spark aggregation by using mapPartitions transformation maintaining hash table for this aggregation to run. This comprises the rest of 40% memory left after MemoryOverhead. In your case it is ~660 MB .

If any of the above allocations are not met by your job, then it is highly likely to end up in OOM problems.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM