简体   繁体   中英

Reading 2 input files in hadoop mapreduce

I need to read 2 different input files and write 2 output files. 1st file is main input file, 2nd is as dictionary. My job should handle both files at the same time in mapper, and in reducers too. As I understood I cant't use multiinput. I tried use BufferedReader and BufferedWriter. But then I have create another job in mapper, and another job in reducer. How can I solve problem?

You can use multiple file input , refer to http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/lib/MultipleInputs.html .

MultipleInputs.addInputPath(job, inputPath1, TextInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, inputPath2, TextInputFormat.class, MyMapper.class);

You can have more than one file in the inputPath1, inputPath2..

If the size of your second file is less you can use Distributed Cache and use the file in mappers for processing. Refer to http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM