简体   繁体   English

hadoop-Hadoop jar输入路径问题

[英]hadoop - Hadoop jar input path issue

The issue I'm having is that the hadoop jar command requires an input path, but my MapReduce job gets its input from a database and hence doesn't need/have an input directory. 我遇到的问题是hadoop jar命令需要输入路径,但是我的MapReduce作业从数据库获取输入,因此不需要/没有输入目录。 I've set the JobConf inputformat to DBInputFormat, but how do I signify this when jarring my job? 我已经将JobConf输入格式设置为DBInputFormat,但是在破坏我的工作时如何表示呢?

//Here is the command
hadoop jar <my-jar> <hdfs input> <hdfs output>

I have an output folder, but don't need an input folder. 我有一个输出文件夹,但不需要输入文件夹。 Is there a way to circumvent this? 有办法避免这种情况吗? Do I need to write a second program that pulls the DB data into a folder and then use that in the MapReduce job? 我是否需要编写另一个程序来将数据库数据提取到一个文件夹中,然后在MapReduce作业中使用它?

The hadoop jar command requires no command line arguments, other than maybe the main class. hadoop jar命令除了主类外,不需要命令行参数。 The command line arguments for your map/reduce job will be decided by the program itself. 映射/归约作业的命令行参数将由程序本身决定。 So if it no longer requires an HDFS input path, then you would need to change the code to not require that. 因此,如果它不再需要HDFS输入路径,那么您将需要更改代码以不需要它。

public class MyJob extends Configured implements Tool
{
   public void run(String[] args) throws Exception {
     // ...
     TextInputFormat.setInputPaths(job, new Path(args[0])); // or some other file input format
     TextOutputFormat.setOutputPath(job, new Path(args[1]));
   }
}

So you would remove the input path statement. 因此,您将删除输入路径语句。 There is no magic in JAR'ing the job up, just change the InputFormat (which you said you did) and you should be set. JAR'ing工作没有什么神奇之处,只需更改InputFormat(您说过的话),然后进行设置即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM