从集群获取现有的mapreduce作业（该作业可能正在运行或已完成）

Question

Previously, I was using org.apache.hadoop.mapred.JobClient#getJob(org.apache.hadoop.mapred.JobID) to get the RunningJob . 以前，我使用org.apache.hadoop.mapred.JobClient#getJob(org.apache.hadoop.mapred.JobID)获取RunningJob 。 This call was made from the job completion callback method, however, seems to me that there is a timing issue where if the job is already completed then the above getJob() method cannot find it and returns null. 该调用是从作业完成回调方法进行的，但是在我看来，这是一个计时问题，如果该作业已经完成，则上述getJob()方法无法找到它并返回null。 I can confirm that the job was completed from the cluster UI. 我可以从群集UI确认作业已完成。

Keeping the RunningJob apart, is there a way to get the org.apache.hadoop.mapreduce.Job object of a mapred job given the org.apache.hadoop.mapreduce.JobID , regardless whether the job is currently running or is completed? 保持RunningJob之余，有没有办法让org.apache.hadoop.mapreduce.Job给出的一个mapred工作的对象org.apache.hadoop.mapreduce.JobID ，作业不论是否正在运行或已完成？

I tried to code up something like: 我试图编写如下代码：

Cluster cluster = jobClient.getClusterHandle(); Job job = cluster.getJob(JobID.forName(jobId)); log.info("Trying to get actual job with id {} , found {} on cluster {}", JobID.forName(jobId), job, cluster);

I can see the right jobId, and can also see the cluster object.. but the cluster.getJob() method returns null, so the job itself is null. 我可以看到正确的jobId，也可以看到集群对象。但是cluster.getJob()方法返回null，因此作业本身为null。

Is there something that I'm missing out here? 有什么我想念的吗？

Answer 1

The problem was with with a recent yarn upgrade that required enabling MR history server on my system. 问题在于最近的纱线升级，需要在我的系统上启用MR历史记录服务器。 This fixed the issue. 这解决了问题。 I recently upgraded from MR v1 to v2 and in that upgrade, all completed jobs are now moved to the history server. 我最近从MR v1升级到v2，在该升级中，所有已完成的作业现在都移到了历史记录服务器上。

Answer 2

You look for getAllJobStatuses() that return JobStatus[] : 您正在寻找返回getAllJobStatuses() JobStatus[] getAllJobStatuses() ：

  List<JobStatus> runningJobs = new ArrayList<JobStatus>();
  List<JobStatus> completedJobs = new ArrayList<JobStatus>();
  for (JobStatus job : cluster.getAllJobStatuses()) {
    if (!job.isJobComplete()) {
      runningJobs.add(job);
    }
    else {
      completedJobs.add(job)
    }
  }

  // list of running JobIDs
  for (JobStatus rjob : runningJobs) {
        System.out.println(rjob.getJobID().toString());
  }
  // list of completed JobIDs
  for (JobStatus cjob : completedJobs) {
        System.out.println(cjob.getJobID().toString());
  }

  // to print out short report on running jobs:
  // displayJobList(runningJobs.toArray(new JobStatus[0]));

从集群获取现有的mapreduce作业（该作业可能正在运行或已完成）

问题描述

2 个解决方案

解决方案1
5 已采纳 2017-05-19 14:19:51

解决方案2
1 2017-05-19 13:59:07

从集群获取现有的mapreduce作业（该作业可能正在运行或已完成）

问题描述

2 个解决方案

解决方案1 5 已采纳 2017-05-19 14:19:51

解决方案2 1 2017-05-19 13:59:07

解决方案1
5 已采纳 2017-05-19 14:19:51

解决方案2
1 2017-05-19 13:59:07