简体繁体中英

EMR 4.1.0 + Spark 1.5.0 + YARN Resource Allocation

原文 2015-11-02 23:26:18 9 1 apache-spark/ yarn/ emr

I am using EMR 4.1.0 + spark 1.5.0 + YARN to process big data. I am trying to utilize full cluster but some how YARN is not allocating all the resources.

Using 4 X c3.8xlarge EC2 slave nodes (each 60.0 GB Memory and 32 cores)
According to this article I have set following parameters in EMR cluster

yarn.nodemanager.resource.memory-mb -> 53856 yarn.nodemanager.resource.cpu-vcores -> 26 yarn.scheduler.capacity.resource-calculator -> org.apache.hadoop.yarn.util.resource.DominantResourceCalculator (so yarn can manage both memory and cores)

Then I started pyspark with pyspark --master yarn-client --num-executors 24 --executor-memory 8347m --executor-cores 4

But RM UI shows following

It allocates only 21 containers vs requested 24 27 GB reserved memory and 12 reserved core can be used to allocate more 3 containers. right?

What am I missing here?

Thank You!

1 answers

From here , it looks like your base should be 53248M. Additionally, there is a 10% memory overhead that must be accounted for (spark.yarn.executor.memoryOverhead). 53248*.9 = 47932M that can be allocated on each node. If you allocate 8347M for each executor, each node can only contain 5 of them. 47932 - 5* 8347 = 6197M, which is not enough free memory to launch a 6th executor. The last 3 executors (one for each node) are not launching because there is not enough memory for them to launch. If you want to have 24 containers, launch with --executor-memory 7987M

Note, if you will have 6 unused cores/node if you use this configuration. This spreadsheet could help you find the best configurations for any type/size of cluster

https://docs.google.com/spreadsheets/d/1VH7Qly308hoRPu5VoLIg0ceolrzen-nBktRFkXHRrY4/edit#gid=1524766257

Resource Allocation with Spark and Yarn

Hadoop YARN slow resource allocation for Spark

Dynamic Resource allocation in Spark-Yarn Cluster Mode

Spark SASL not working on the emr with yarn

Spark on Yarn Resource Management on Amazon EMR: How to utilize all available cores for spark job execution

What does container/resource allocation mean in Hadoop and in Spark when running on Yarn?

Spark on YARN - Cannot allocate containers as requested resource is greater than maximum allowed allocation

assigning a yarn configuration to a spark command in EMR

Yarn keeps on killing Spark Application master on EMR

Spark application cannot run successfully on EMR with YARN

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Resource Allocation with Spark and Yarn Hadoop YARN slow resource allocation for Spark Dynamic Resource allocation in Spark-Yarn Cluster Mode Spark SASL not working on the emr with yarn Spark on Yarn Resource Management on Amazon EMR: How to utilize all available cores for spark job execution What does container/resource allocation mean in Hadoop and in Spark when running on Yarn? Spark on YARN - Cannot allocate containers as requested resource is greater than maximum allowed allocation assigning a yarn configuration to a spark command in EMR Yarn keeps on killing Spark Application master on EMR Spark application cannot run successfully on EMR with YARN

Related Tags

EMR 4.1.0 + Spark 1.5.0 + YARN Resource Allocation

Question

1 answers

solution1 0 2016-03-28 20:24:21

solution1
0 2016-03-28 20:24:21