Is there a way to get the shuffle time required from each reduce task from the client side using the Hadoop API (Hadoop 1.2.1). I can get the execution time of the reduce tasks from the JobClient using the getReduceTaskReports(JobID jobID) method, but I wonder is there a way to get the percentage that corresponds to the shuffle time. Thanks in advance.
The solution to the problem was to use Apache Rumen ( http://hadoop.apache.org/docs/r1.2.1/rumen.html ). This framework enables you to retrieve job history logs in a JSON format, with simple JSON parsing I was able to retrieve the information I needed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.