Reading 2 input files in hadoop mapreduce

Question

I need to read 2 different input files and write 2 output files. 1st file is main input file, 2nd is as dictionary. My job should handle both files at the same time in mapper, and in reducers too. As I understood I cant't use multiinput. I tried use BufferedReader and BufferedWriter. But then I have create another job in mapper, and another job in reducer. How can I solve problem?

Answer 1

You can use multiple file input , refer to http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/lib/MultipleInputs.html .

MultipleInputs.addInputPath(job, inputPath1, TextInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, inputPath2, TextInputFormat.class, MyMapper.class);

You can have more than one file in the inputPath1, inputPath2..

Answer 2

If the size of your second file is less you can use Distributed Cache and use the file in mappers for processing. Refer to http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html

Reading 2 input files in hadoop mapreduce

Question

2 answers

solution1
0 2015-05-08 03:56:15

solution2
0 2015-05-08 10:00:15

Reading 2 input files in hadoop mapreduce

Question

2 answers

solution1 0 2015-05-08 03:56:15

solution2 0 2015-05-08 10:00:15

solution1
0 2015-05-08 03:56:15

solution2
0 2015-05-08 10:00:15