[英]Getting existing mapreduce job from cluster (the job could be running or completed)
Previously, I was using org.apache.hadoop.mapred.JobClient#getJob(org.apache.hadoop.mapred.JobID)
to get the RunningJob
. 以前,我使用
org.apache.hadoop.mapred.JobClient#getJob(org.apache.hadoop.mapred.JobID)
获取RunningJob
。 This call was made from the job completion callback method, however, seems to me that there is a timing issue where if the job is already completed then the above getJob()
method cannot find it and returns null. 该调用是从作业完成回调方法进行的,但是在我看来,这是一个计时问题,如果该作业已经完成,则上述
getJob()
方法无法找到它并返回null。 I can confirm that the job was completed from the cluster UI. 我可以从群集UI确认作业已完成。
Keeping the RunningJob
apart, is there a way to get the org.apache.hadoop.mapreduce.Job
object of a mapred job given the org.apache.hadoop.mapreduce.JobID
, regardless whether the job is currently running or is completed? 保持
RunningJob
之余,有没有办法让org.apache.hadoop.mapreduce.Job
给出的一个mapred工作的对象org.apache.hadoop.mapreduce.JobID
,作业不论是否正在运行或已完成?
I tried to code up something like: 我试图编写如下代码:
Cluster cluster = jobClient.getClusterHandle(); Job job = cluster.getJob(JobID.forName(jobId)); log.info("Trying to get actual job with id {} , found {} on cluster {}", JobID.forName(jobId), job, cluster);
I can see the right jobId, and can also see the cluster object.. but the cluster.getJob()
method returns null, so the job itself is null. 我可以看到正确的jobId,也可以看到集群对象。但是
cluster.getJob()
方法返回null,因此作业本身为null。
Is there something that I'm missing out here? 有什么我想念的吗?
The problem was with with a recent yarn upgrade that required enabling MR history server on my system. 问题在于最近的纱线升级,需要在我的系统上启用MR历史记录服务器。 This fixed the issue.
这解决了问题。 I recently upgraded from MR v1 to v2 and in that upgrade, all completed jobs are now moved to the history server.
我最近从MR v1升级到v2,在该升级中,所有已完成的作业现在都移到了历史记录服务器上。
You look for getAllJobStatuses()
that return JobStatus[]
: 您正在寻找返回
getAllJobStatuses()
JobStatus[]
getAllJobStatuses()
:
List<JobStatus> runningJobs = new ArrayList<JobStatus>();
List<JobStatus> completedJobs = new ArrayList<JobStatus>();
for (JobStatus job : cluster.getAllJobStatuses()) {
if (!job.isJobComplete()) {
runningJobs.add(job);
}
else {
completedJobs.add(job)
}
}
// list of running JobIDs
for (JobStatus rjob : runningJobs) {
System.out.println(rjob.getJobID().toString());
}
// list of completed JobIDs
for (JobStatus cjob : completedJobs) {
System.out.println(cjob.getJobID().toString());
}
// to print out short report on running jobs:
// displayJobList(runningJobs.toArray(new JobStatus[0]));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.