Hadoop 2.7: MapReduce task's total time using streaming API

Question

I am running Hadoop 2.7.1 on a local cluster (all nodes running Ubuntu 14.x or above). My mapreduce programs are written in Python and I am using the streaming API to run the task. I want to find out the total time that all the mapred tasks over all the nodes are taking. How to do that? I am not able to find the job files. (Perhaps removed from Hadoop 2.x onwards).

Answer 1

If you're looking for the sum of all the aggregate time spent in all your tasks, you'll likely want to look at the counters. These can be viewed on the job history server as well clicking on Counters on the left after drilling into individual jobs, or alternatively you can do this more programmatically using mapred job commands, for example, to print out all the summary statuses of SUCCEEDED jobs:

mapred job -list all | grep SUCCEEDED | awk '{ print $1 }' | \
    xargs -n 1 mapred job -status

The closest to "aggregate wall time" that counts as consumed time on your cluster would be "time spent in occupied slots", which is SLOTS_MILLIS_MAPS and SLOTS_MILLIS_REDUCES :

mapred job -list all | grep SUCCEEDED | awk '{ print $1 }' | \
    xargs -n 1 -i mapred job -counter {} org.apache.hadoop.mapreduce.JobCounter SLOTS_MILLIS_MAPS
mapred job -list all | grep SUCCEEDED | awk '{ print $1 }' | \
    xargs -n 1 -i mapred job -counter {} org.apache.hadoop.mapreduce.JobCounter SLOTS_MILLIS_REDUCES

Answer 2

total time that all the mapred tasks is job elapsed time. You can look it in hadoop web interface (click specified job). http://ip_address:8088/

Hadoop 2.7: MapReduce task's total time using streaming API

Question

2 answers

solution1
1 ACCPTED 2015-12-04 03:43:56

solution2
0 2015-11-20 07:26:02

Hadoop 2.7: MapReduce task's total time using streaming API

Question

2 answers

solution1 1 ACCPTED 2015-12-04 03:43:56

solution2 0 2015-11-20 07:26:02

solution1
1 ACCPTED 2015-12-04 03:43:56

solution2
0 2015-11-20 07:26:02