简体   繁体   中英

How do i get each mappers and reducers execution time

I am running a hadoop-2.2.0, pseudo distributed cluster. I tried using following code to get the time taken by each mapper and reducer , but i am getting here number of mappers and reducers 0..

JobConf conf = new JobConf(getConf(), WordCount.class);
    conf.setJobName("wordcount");


    // the keys are words (strings)
    conf.setOutputKeyClass(Text.class);
    // the values are counts (ints)
    conf.setOutputValueClass(IntWritable.class);

    conf.setMapperClass(MapClass.class);        
    conf.setCombinerClass(Reduce.class);
    conf.setReducerClass(Reduce.class);

    List<String> other_args = new ArrayList<String>();
    for(int i=0; i < args.length; ++i) {
      try {
        if ("-m".equals(args[i])) {
          conf.setNumMapTasks(Integer.parseInt(args[++i]));
        } else if ("-r".equals(args[i])) {
          conf.setNumReduceTasks(Integer.parseInt(args[++i]));
        } else {
          other_args.add(args[i]);
        }
      } catch (NumberFormatException except) {
        System.out.println("ERROR: Integer expected instead of " + args[i]);
        return printUsage();
      } catch (ArrayIndexOutOfBoundsException except) {
        System.out.println("ERROR: Required parameter missing from " +
                           args[i-1]);
        return printUsage();
      }
    }
    // Make sure there are exactly 2 parameters left.
    if (other_args.size() != 2) {
      System.out.println("ERROR: Wrong number of parameters: " +
                         other_args.size() + " instead of 2.");
      return printUsage();
    }
    FileInputFormat.setInputPaths(conf, other_args.get(0));
    FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));

    JobClient jobclient = new JobClient(conf);
    RunningJob runjob = jobclient.submitJob(conf);          

    TaskReport[] maps = jobclient.getMapTaskReports(runjob.getID());
    System.out.println("Number of Mappers "+maps.length);
    for (TaskReport rpt : maps) {
      long duration = rpt.getFinishTime() - rpt.getStartTime();
      System.out.println("Mapper duration: " + duration);
    }

    TaskReport[] reduces = jobclient.getReduceTaskReports(runjob.getID());
     System.out.println("Number of Reducers "+reduces.length);
     for (TaskReport rpt : reduces) {
      long duration = rpt.getFinishTime() - rpt.getStartTime();
      System.out.println("Reducer duration: " + duration);
    }

    return 0;

am doing wrong?..

You are almost there. The only thing is that the query of TaskReport happens too soon before meaningful progress is made by submitted job. So to get the result, the following code will do:

    ...
    RunningJob runjob = jobclient.submitJob(conf); 
    while (!runjob.isComplete()) {
        System.out.println("sleeping for 5 sec...");
        Thread.sleep(5000);
    }
    TaskReport[] maps = jobclient.getMapTaskReports(runjob.getID());
    ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM