简体   繁体   English

您如何在Hadoop中获得JobTracker的实例?

[英]How do you get an instance of the JobTracker in Hadoop?

I know this must be simple, but I can not figure out how to get an instance of the Hadoop JobTracker. 我知道这一定很简单,但是我无法弄清楚如何获取Hadoop JobTracker的实例。 After realizing that I can't call any of it's constructors, I am trying to instantiate it like so after I have submitted a Job: 在意识到我无法调用任何构造函数之后,我试图像这样在实例化作业之后实例化它:

  JobClient client = new JobClient(conf);
  RunningJob runningJob = client.submitJob(conf);    
  JobTracker jobTracker = JobTracker.startTracker(_conf);

If I take the jobTracker line out, the program runs just fine. 如果我把jobTracker行删掉,程序运行正常。 When I leave it in, I get an exception saying that the JobTracker already exists for that JobConf. 当我把它留在里面时,我得到一个例外,说该JobConf已经存在JobTracker。

Does anyone know how I should be getting an instance of the JobTracker? 有谁知道我应该如何获得JobTracker的实例? I am currently using hadoop 1.2.1. 我目前正在使用hadoop 1.2.1。

EDIT: I am thinking that the exception I am getting is because the JobTracker is trying to start on a port that already has a JobTracker listening on it. 编辑:我认为我得到的异常是因为JobTracker试图在已经有JobTracker监听的端口上启动。 I could possibly start the JobTracker on a different port for my test run, but then I would have 2 JobTrackers for my Hadoop cluster which is not how it is supposed to run. 我可以在其他端口上启动JobTracker进行测试运行,但随后我的Hadoop集群将有2个JobTracker,这不是应该的运行方式。 I am tempted to try to stop the existing JobTracker before starting my own (this is completely in a test environment, and this would not be feasible if the system was shared with other Jobs), but this seems to be going down the wrong path. 我很想尝试在启动自己的JobTracker之前先停止现有的JobTracker(这完全是在测试环境中,如果与其他Jobs共享系统,这将是不可行的),但这似乎是走错了路。

You don't create an instance of the JobTracker in your client code, the JobClient you have already created has all the methods that you can use to interact with the running JobTracker (which is where your job is submitted to via the call to submitJob. 您不会在客户端代码中创建JobTracker的实例,已经创建的JobClient具有可用于与正在运行的JobTracker交互的所有方法(在该位置,通过调用SubmitJob提交作业)。

Edit 编辑

Unfortunately acquiring the successful task hostnames isn't available via the client API - frustratingly you can do this via the command line but it uses private API calls, so you'll still need to perform some string scraping if you were to invoke this command and then parse the stdout (and you'll also get the task setup and cleanup events): 不幸的是,无法通过客户端API获得成功的任务主机名-令人沮丧的是您可以通过命令行执行此操作,但是它使用私有API调用,因此,如果要调用此命令,还需要执行一些字符串抓取操作,然后解析标准输出(并且您还将获得任务设置和清除事件):

user@host1:~$ /opt/hadoop/default/bin/hadoop job -events job_201311110747_0001 0 100
Task completion events for job_201311110747_0001
Number of events (from 0) are: 4
SUCCEEDED attempt_201311110747_0001_m_000002_0 http://host1:50060/tasklog?plaintext=true&attemptid=attempt_201311110747_0001_m_000002_0
SUCCEEDED attempt_201311110747_0001_m_000000_0 http://host2:50060/tasklog?plaintext=true&attemptid=attempt_201311110747_0001_m_000000_0
SUCCEEDED attempt_201311110747_0001_r_000000_0 http://host3:50060/tasklog?plaintext=true&attemptid=attempt_201311110747_0001_r_000000_0
SUCCEEDED attempt_201311110747_0001_m_000001_0 http://host4:50060/tasklog?plaintext=true&attemptid=attempt_201311110747_0001_m_000001_0 

An option could be to use some reflection based hackery to make the private APIs publicly visible and then use as needed - for reference here's the API calls you need to replicate the above in code (and this may not be forward or backwards compatible with different versions of Hadoop - here's for 1.2.1): 一种选择是使用一些基于反射的黑客手段,使私有API公开可见,然后根据需要使用-供参考,这里是您需要在代码中复制以上内容的API调用(并且可能与不同版本不兼容。 Hadoop-这是1.2.1版):

public class JobClientDriver extends Configured implements Tool {
    public static void main(String args[]) throws Exception {
        ToolRunner.run(new JobClientDriver(), args);
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();

        JobClient client = new JobClient(new JobConf(conf));

        Method method = JobClient.class.getDeclaredMethod("createRPCProxy", InetSocketAddress.class,
                Configuration.class);
        method.setAccessible(true);

        Object rpcClientSubProtocol = method.invoke(client, JobTracker.getAddress(conf), conf);

        Method completeEventsMethod = rpcClientSubProtocol.getClass().getDeclaredMethod("getTaskCompletionEvents",
                JobID.class, int.class, int.class);

        for (Object tceObj : ((Object[]) completeEventsMethod.invoke(rpcClientSubProtocol,
            JobID.forName("job_201311110747_0001"), 0, 100))) {
            TaskCompletionEvent tce = (TaskCompletionEvent) tceObj;
            if (tce.isMapTask()) {
                URI uri = new URI(tce.getTaskTrackerHttp());
                System.err.println(tce.getTaskAttemptId() + " @ " + uri.getHost());
            }
        }

        return 0;
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM