简体繁体中英

YARN: How to run MapReduce jobs with lot of mappers comparing to cluster size

原文 2014-04-28 18:47:31 2 1 hadoop/ mapreduce/ scheduling/ yarn

I have 1-node Hadoop test setup with MapReduce job which starts 96 mappers and 6 reducers. Before migration to YARN this job performed steady but normal. With YARN it started to hang 100% with most of mappers in 'pending' state.

Job is actually 6 sub-jobs (16 mappers + 1 reducer each). This configuration reflects production process sequence. All of them are under single JobControl. Is there any configuration I need to check or best practice for such cases with small amount of nodes and relatively large jobs comparing to cluster size?

Of course I'm not about performance but just ability to pass this job for developers. Worst case I could 'reduce job with' grouping sub-jobs but I'd like not to do so because on production there is no reason to do so and I'd like test and production sequence to be the same.

When I have migrated to YARN scheduler was changed to FairScheduler and currently it is the only option as I run Cloudera and Cloudera strongly recommend not to use anything but fair scheduler . So switching to FIFO scheduler is not an option.

Any alternative in my case in addition to 'redesign job'?

1 answers

Currently solved my troubles with disabling 'queue per user' logics (switch to single queue) and limiting amount of running applications using allocation file. In accordance to http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html this allows to configure almost anything per queue that you need.

Here are actual steps:

yarn.scheduler.fair.user-as-default-queue was set to false.
In Cloudera manager dynamic resource allocation for queue 'default' was changed so queue allows no more than 2 running applications. Good enough for 1-node design testing tool. In open source this will be correction to allocation file.

By now works as needed. Left everything else including default policy untouched.

How wordCount mapReduce jobs, run on hadoop yarn cluster with apache tez?

jobs run with no mappers or reducers

MapReduce: How to pass HashMap to mappers

% of Queue and % of Cluster difference in yarn ui for MR/TEZ/SPARK jobs run

How many Mapreduce Jobs can be run simultaneously

How to make hive run mapreduce jobs concurrently?

Issue allocating resources (Number of Mappers) in EMR MapReduce2 YARN

Number of concurrently running mappers per node drops precipitously on Elastic MapReduce w/ AMI 3.1.0 and Hadoop 2.4.0 as cluster size increases

MapReduce Jobs failing, after accepted by YARN

How to check YARN mapreduce tasks max heap size setting

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How wordCount mapReduce jobs, run on hadoop yarn cluster with apache tez? jobs run with no mappers or reducers MapReduce: How to pass HashMap to mappers % of Queue and % of Cluster difference in yarn ui for MR/TEZ/SPARK jobs run How many Mapreduce Jobs can be run simultaneously How to make hive run mapreduce jobs concurrently? Issue allocating resources (Number of Mappers) in EMR MapReduce2 YARN Number of concurrently running mappers per node drops precipitously on Elastic MapReduce w/ AMI 3.1.0 and Hadoop 2.4.0 as cluster size increases MapReduce Jobs failing, after accepted by YARN How to check YARN mapreduce tasks max heap size setting

Related Tags

YARN: How to run MapReduce jobs with lot of mappers comparing to cluster size

Question

1 answers

solution1 1 ACCPTED 2014-04-28 20:28:39

solution1
1 ACCPTED 2014-04-28 20:28:39