hadoop - Hadoop jar input path issue

Question

The issue I'm having is that the hadoop jar command requires an input path, but my MapReduce job gets its input from a database and hence doesn't need/have an input directory. I've set the JobConf inputformat to DBInputFormat, but how do I signify this when jarring my job?

//Here is the command
hadoop jar <my-jar> <hdfs input> <hdfs output>

I have an output folder, but don't need an input folder. Is there a way to circumvent this? Do I need to write a second program that pulls the DB data into a folder and then use that in the MapReduce job?

Answer 1

The hadoop jar command requires no command line arguments, other than maybe the main class. The command line arguments for your map/reduce job will be decided by the program itself. So if it no longer requires an HDFS input path, then you would need to change the code to not require that.

public class MyJob extends Configured implements Tool
{
   public void run(String[] args) throws Exception {
     // ...
     TextInputFormat.setInputPaths(job, new Path(args[0])); // or some other file input format
     TextOutputFormat.setOutputPath(job, new Path(args[1]));
   }
}

So you would remove the input path statement. There is no magic in JAR'ing the job up, just change the InputFormat (which you said you did) and you should be set.

hadoop - Hadoop jar input path issue

Question

1 answers

solution1
5 ACCPTED 2013-10-07 22:14:15

hadoop - Hadoop jar input path issue

Question

1 answers

solution1 5 ACCPTED 2013-10-07 22:14:15

solution1
5 ACCPTED 2013-10-07 22:14:15