Hadoop JobClient getJob method takes too long to execute

Question

I'm trying to fetch the currently running job from a Hadoop 2.6.0 cluster using the jobId .

I have as input the jobId of the currently running Hadoop jobs. Using the RunningJob object, I want to fetch details about the Hadoop Job

I'm using Hadoop 2.x Java API.

For this I used the following code:

JobID jobID = JobID.forName(jobId);
Configuration conf = new Configuration();
JobClient client = new JobClient(new InetSocketAddress(ip,conf));
RunningJob job = client.getJob(jobId);

If the job is currently in RUNNING stage, the getJob() takes too long to execute.

I'm not able to understand why this is taking too much time to get the RunningJob object.

I have tried same in Hadoop 1.1.2 version, but there I didn't face this issue. In this case, I used the Hadoop 1.x Java API.

Answer 1

I too had the same issue today. Hadoop's JobClient expects 3 must parameters to track your job from yarn.

1. yarn.resourcemanager.address

2. mapreduce.jobhistory.address

3. mapreduce.framework.name

and it should be called in following way to create jobClient

Configuration conf = new Configuration();
conf.set("mapreduce.framework.name", "yarn");
conf.set("yarn.resourcemanager.address",jobTrackerIp);
conf.set("mapreduce.jobhistory.address",jobHistoryIp);
JobClient client = new JobClient(conf);

Once it is created in this way jobClient is ready for use. They have suppressed some of the errors in backend and used retry logic. That's why we are not able to see any error in our code.

Hadoop JobClient getJob method takes too long to execute

Question

1 answers

solution1
0 2015-10-14 13:07:19

Hadoop JobClient getJob method takes too long to execute

Question

1 answers

solution1 0 2015-10-14 13:07:19

solution1
0 2015-10-14 13:07:19