简体   繁体   中英

How do you get an instance of the JobTracker in Hadoop?

I know this must be simple, but I can not figure out how to get an instance of the Hadoop JobTracker. After realizing that I can't call any of it's constructors, I am trying to instantiate it like so after I have submitted a Job:

  JobClient client = new JobClient(conf);
  RunningJob runningJob = client.submitJob(conf);    
  JobTracker jobTracker = JobTracker.startTracker(_conf);

If I take the jobTracker line out, the program runs just fine. When I leave it in, I get an exception saying that the JobTracker already exists for that JobConf.

Does anyone know how I should be getting an instance of the JobTracker? I am currently using hadoop 1.2.1.

EDIT: I am thinking that the exception I am getting is because the JobTracker is trying to start on a port that already has a JobTracker listening on it. I could possibly start the JobTracker on a different port for my test run, but then I would have 2 JobTrackers for my Hadoop cluster which is not how it is supposed to run. I am tempted to try to stop the existing JobTracker before starting my own (this is completely in a test environment, and this would not be feasible if the system was shared with other Jobs), but this seems to be going down the wrong path.

You don't create an instance of the JobTracker in your client code, the JobClient you have already created has all the methods that you can use to interact with the running JobTracker (which is where your job is submitted to via the call to submitJob.

Edit

Unfortunately acquiring the successful task hostnames isn't available via the client API - frustratingly you can do this via the command line but it uses private API calls, so you'll still need to perform some string scraping if you were to invoke this command and then parse the stdout (and you'll also get the task setup and cleanup events):

user@host1:~$ /opt/hadoop/default/bin/hadoop job -events job_201311110747_0001 0 100
Task completion events for job_201311110747_0001
Number of events (from 0) are: 4
SUCCEEDED attempt_201311110747_0001_m_000002_0 http://host1:50060/tasklog?plaintext=true&attemptid=attempt_201311110747_0001_m_000002_0
SUCCEEDED attempt_201311110747_0001_m_000000_0 http://host2:50060/tasklog?plaintext=true&attemptid=attempt_201311110747_0001_m_000000_0
SUCCEEDED attempt_201311110747_0001_r_000000_0 http://host3:50060/tasklog?plaintext=true&attemptid=attempt_201311110747_0001_r_000000_0
SUCCEEDED attempt_201311110747_0001_m_000001_0 http://host4:50060/tasklog?plaintext=true&attemptid=attempt_201311110747_0001_m_000001_0 

An option could be to use some reflection based hackery to make the private APIs publicly visible and then use as needed - for reference here's the API calls you need to replicate the above in code (and this may not be forward or backwards compatible with different versions of Hadoop - here's for 1.2.1):

public class JobClientDriver extends Configured implements Tool {
    public static void main(String args[]) throws Exception {
        ToolRunner.run(new JobClientDriver(), args);
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();

        JobClient client = new JobClient(new JobConf(conf));

        Method method = JobClient.class.getDeclaredMethod("createRPCProxy", InetSocketAddress.class,
                Configuration.class);
        method.setAccessible(true);

        Object rpcClientSubProtocol = method.invoke(client, JobTracker.getAddress(conf), conf);

        Method completeEventsMethod = rpcClientSubProtocol.getClass().getDeclaredMethod("getTaskCompletionEvents",
                JobID.class, int.class, int.class);

        for (Object tceObj : ((Object[]) completeEventsMethod.invoke(rpcClientSubProtocol,
            JobID.forName("job_201311110747_0001"), 0, 100))) {
            TaskCompletionEvent tce = (TaskCompletionEvent) tceObj;
            if (tce.isMapTask()) {
                URI uri = new URI(tce.getTaskTrackerHttp());
                System.err.println(tce.getTaskAttemptId() + " @ " + uri.getHost());
            }
        }

        return 0;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM