I have 1-node Hadoop test setup with MapReduce job which starts 96 mappers and 6 reducers. Before migration to YARN this job performed steady but normal. With YARN it started to hang 100% with most of mappers in 'pending' state.
Job is actually 6 sub-jobs (16 mappers + 1 reducer each). This configuration reflects production process sequence. All of them are under single JobControl. Is there any configuration I need to check or best practice for such cases with small amount of nodes and relatively large jobs comparing to cluster size?
Of course I'm not about performance but just ability to pass this job for developers. Worst case I could 'reduce job with' grouping sub-jobs but I'd like not to do so because on production there is no reason to do so and I'd like test and production sequence to be the same.
When I have migrated to YARN scheduler was changed to FairScheduler and currently it is the only option as I run Cloudera and Cloudera strongly recommend not to use anything but fair scheduler
. So switching to FIFO scheduler is not an option.
Any alternative in my case in addition to 'redesign job'?
Currently solved my troubles with disabling 'queue per user' logics (switch to single queue) and limiting amount of running applications using allocation file. In accordance to http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html this allows to configure almost anything per queue that you need.
Here are actual steps:
yarn.scheduler.fair.user-as-default-queue
was set to false. By now works as needed. Left everything else including default policy untouched.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.