简体   繁体   中英

Number of concurrently running mappers per node drops precipitously on Elastic MapReduce w/ AMI 3.1.0 and Hadoop 2.4.0 as cluster size increases

In a related question ( How to set the precise max number of concurrently running tasks per node in Hadoop 2.4.0 on Elastic MapReduce ), I ask for formulas relating the number of concurrently running mappers/reducers to YARN and MR2 memory parameters. It turns out that on Elastic MapReduce, when my cluster has between 2 and 10 c3.2xlarge nodes, variations of the formulas mentioned there work okay, giving me 7-9 concurrently running mappers per node; but when the number of c3.2xlarges is 20 or 40, I get cluster underutilization: only 1-4 mappers run per node. Since my job is CPU-bound, this is particularly awful: MR2 delivers _half_the performance of MR1 for me.

Why is this happening?

You will be limited from what the NameNode can dish out. You can and should specific a larger instance type for the NameNode when increase your Task nodes as such. The MR1 page was never updated for c3s http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM