I am using EMR 4.1.0 + spark 1.5.0 + YARN to process big data. I am trying to utilize full cluster but some how YARN is not allocating all the resources.
yarn.nodemanager.resource.memory-mb -> 53856 yarn.nodemanager.resource.cpu-vcores -> 26 yarn.scheduler.capacity.resource-calculator -> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator (so yarn can manage both memory and cores)
Then I started pyspark with pyspark --master yarn-client --num-executors 24 --executor-memory 8347m --executor-cores 4
But RM UI shows following
It allocates only 21 containers vs requested 24 27 GB reserved memory and 12 reserved core can be used to allocate more 3 containers. right?
What am I missing here?
Thank You!
From here , it looks like your base should be 53248M. Additionally, there is a 10% memory overhead that must be accounted for (spark.yarn.executor.memoryOverhead). 53248*.9 = 47932M that can be allocated on each node. If you allocate 8347M for each executor, each node can only contain 5 of them. 47932 - 5* 8347 = 6197M, which is not enough free memory to launch a 6th executor. The last 3 executors (one for each node) are not launching because there is not enough memory for them to launch. If you want to have 24 containers, launch with --executor-memory 7987M
Note, if you will have 6 unused cores/node if you use this configuration. This spreadsheet could help you find the best configurations for any type/size of cluster
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.