简体   繁体   中英

Hadoop: Measuring shuffle time from JAVA

Is there a way to get the shuffle time required from each reduce task from the client side using the Hadoop API (Hadoop 1.2.1). I can get the execution time of the reduce tasks from the JobClient using the getReduceTaskReports(JobID jobID) method, but I wonder is there a way to get the percentage that corresponds to the shuffle time. Thanks in advance.

The solution to the problem was to use Apache Rumen ( http://hadoop.apache.org/docs/r1.2.1/rumen.html ). This framework enables you to retrieve job history logs in a JSON format, with simple JSON parsing I was able to retrieve the information I needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM