简体   繁体   English

Hadoop任务经过的时间

[英]Elapsed Time for a Hadoop Task

I have a cluster running YARN on it. 我有一个在其上运行YARN的集群。 It has 3 datanodes and 1 client node. 它具有3个数据节点和1个客户端节点。 I submit all my jobs on the client node. 我在客户端节点上提交了所有作业。 How can I get the elapsed time for all the tasks in a particular job. 如何获得特定作业中所有任务的经过时间。

Probably RESTful API ( https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html ) can be used for this purpose. RESTful API( https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html )可以用于此目的。 But I am curious to know whether there is any Java API to do the same. 但是我很想知道是否有任何Java API可以做到这一点。

I am able to find the start time for all the task using the method getStartTime() of the TaskReport class. 我可以使用TaskReport类的getStartTime()方法找到所有任务的开始时间。 Although the nodes in clusters have times synced using NTP, I don't think it would be a good practice to use the client system current time (System.currentTimeMillis()) to calculate the elapsed time for the Running tasks there can be some accepted lag associated with all the nodes in a cluster even in NTP. 尽管群集中的节点已使用NTP同步了时间,但我认为使用客户端系统的当前时间(System.currentTimeMillis())来计算“正在运行”任务的经过时间不是一个好习惯,但是可以接受一些与群集中所有节点相关的延迟,即使在NTP中也是如此。

In the Job class there is a method called #getTaskReports . Job类中,有一个#getTaskReports方法。

You could use it that way to retrieve the map task duration: 您可以通过这种方式来检索地图任务的持续时间:

Job job = ...;
job.waitForCompletion(); 

TaskReport[] reports = job.getTaskReports(TaskType.MAP);
for(TaskReport report : reports) { 
   long time = report.getFinishTime() - report.getStartTime();
   System.out.println(report.getTaskId() + " took " + time + " millis!");
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM