簡體   English   中英

如何在Map-Reduce中從多個目錄讀取多個文件

[英]How to read multiple files from multiple directories in Map-Reduce

我想從Map-Reduce程序的多個目錄中讀取多個文件。 我試圖在主方法中給出文件名:

FileInputFormat.setInputPaths(conf,new Path("hdfs://localhost:54310/user/test/"));
FileInputFormat.setInputPaths(conf,new Path("hdfs://localhost:54310/Test/test1/"));

但是它僅從一個文件讀取。

讀取多個文件該怎么辦?

請提出解決方案。

謝謝。

FileInputFormat#setInputPaths將在覆蓋之前設置的輸入路徑后設置輸入路徑。 使用FileInputFormat#addInputPathFileInputFormat#addInputPaths添加到現有路徑。

Follow the below steps for passsing multiple input files from different direcories.Just driver code changes.Follow the below driver code.
CODE:
public int run(String[] args) throws Exception {
        Configuration conf=new Configuration();
        Job job=Job.getInstance(conf, "MultipleDirectoryAsInput");

        job.setMapperClass(Map1Class.class);
        job.setMapperClass(Map2Class.class);
        job.setReducerClass(ReducerClass.class);        
         job.setJarByClass(DriverClass.class);      
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);      
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(NullWritable.class);        
        //FileInputFormat.setInputPaths(job, new Path(args[0]));        
        MultipleInputs.addInputPath(job, new Path(args[0]),TextInputFormat.class,Map1Class.class);
        MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, Map2Class.class);            
        FileOutputFormat.setOutputPath(job, new Path(args[2])); 
        return job.waitForCompletion(true)?0:1;     
    }

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM