简体   繁体   中英

Hadoop Cannot set Reducers > 1

I am using Hadoop for a university assignment and I have the code working however im running into a small issue.

I am trying to set the number of reducers to 19 ( which is 0.95 * capacity as the docs suggest). However when I view my job in the task tracker it says 1 reducer in total.

System.err.println("here");
job.setNumReduceTasks(19);
System.err.println(job.getNumReduceTasks());

Yields as expected:

here
19

But in the final output I get:

12/05/16 11:10:54 INFO mapred.JobClient:     Data-local map tasks=111
12/05/16 11:10:54 INFO mapred.JobClient:     Rack-local map tasks=58
12/05/16 11:10:54 INFO mapred.JobClient:     Launched map tasks=169
12/05/16 11:10:54 INFO mapred.JobClient:     Launched reduce tasks=1

The parts of the mapreduce I have overwritten are:

  • Mapper
  • Reducer
  • Partitioner
  • Grouping Comparator.

My first thought was that the partitioner was returning the same value for every key. I check this and it was not the case.

I have also checked that the grouper works correctly.

I am not sure what else could be causing this. If anyone could help it would be much appreciated.

I am very much an anti Java person so please try and use very explicit examples if you could.

PS: I did not set this cluster up it was setup by the university so I am unsure of any configuration variables. PS: There was too much code to post so please let me know any code in particular you would like to see.

Edit: I was asked the following questions by TejasP:

Are you really running the code on Hadoop or its in local mode ? (see if your jobs are seen on the jobtracker and tasktracker).

Yes I am, It is viewable in the jobtracker UI. This also reports 1 reducer. As well as having Note: This has the reducers listed as 1 in the settings.xml

Have you exported HADOOP variables in the environment ?

Yes and they are visible in env and the code does not compile until I have set them.

env | grep HADOOP
HADOOP_HOME=/mnt/biginsights/opt/ibm/biginsights/IHC
HADOOP_CONF_DIR=/mnt/biginsights/opt/ibm/biginsights/hadoop-conf

Is the cluster single node or multiple node ? AND Even if the cluster is of multiple nodes, are all the nodes healthy ? Is there issue with the other nodes ?

Yes there are multiple nodes (10) Job tracker Reports:

Nodes: 10
Map Task Capacity: 20
Reduce Task Capacity: 20
Blacklisted Nodes: 0

Are you using setNumReduceTasks correctly? As stated above I have called set and then get and gotten back the value that it was ment to be (19) but the final code still only used 1.

You can reduce your code to a small map-reduce code by removing details (this is just for ?debugging). Run it. See what happens. Facing same issue, provide the reduced code in the original question.

I will try and edit again with the results

It looks like you are running it in LocalJobRunner mode (most likely from eclipse). In this mode, if the number of reduce tasks is > 1, it resets the number to 1. Take a look at the following few lines from LocalJobRunner.java

int numReduceTasks = job.getNumReduceTasks();
if (numReduceTasks > 1 || numReduceTasks < 0) {
      // we only allow 0 or 1 reducer in local mode
      numReduceTasks = 1;
      job.setNumReduceTasks(1);
}

Few points that you need to consider:

  1. Are you really running the code on Hadoop or its in local mode ? (see if your jobs are seen on the jobtracker and tasktracker)
  2. Have you exported HADOOP variables in the environment ?
  3. Is the cluster single node or multiple node ?
  4. Even if the cluster is of multiple nodes, are all the nodes healthy ? Is there issue with the other nodes ?
  5. Are you using setNumReduceTasks correctly ? You can reduce your code to a small map-reduce code by removing details (this is just for debugging). Run it. See what happens. Facing same issue, provide the reduced code in the original question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM