Implementing mulitple mappers and single reducer in hadoop

Question

I am new to hadoop. I have mutiple folders containing files to processing a data in hadoop. I have doubt to implement mapper in map-reducer algorithm. Can I specify multiple mappers for processing mulitple files and have all input files as one output using a single reducer? If possible, please give guidelines for implementing the above steps.

Answer 1

If you have multiple files, use MultipleInputs

addInputPath() method can be used to:

add multiple paths and one common mapper implementation
add multiple paths with custom mapper and input format implementation.

For having a single reducer, have each maps' output key same...say 1 or "abc". This way, the framework will create only one reducer.

Answer 2

If the files are to be mapped in the same way (eg they all have the same format and processing requirements) then you can configure a single mapper to process all of them.

You do this by configuring the TextInputFormat class:

string folder1 = "file:///home/chrisgerken/blah/blah/folder1";
string folder2 = "file:///home/chrisgerken/blah/blah/folder2";
string folder3 = "file:///home/chrisgerken/blah/blah/folder3";
TextInputFormat.setInputPaths(job, new Path(folder1), new Path(folder2), new Path(folder3));

This will result in all of the files in folders 1, 2 and 3 being processed by the mapper.

Of course, if you need to use a different input type you'll have to configure that type appropriately.

Implementing mulitple mappers and single reducer in hadoop

Question

2 answers

solution1
1 ACCPTED 2012-08-30 14:02:41

solution2
1 2012-08-30 14:03:28

Implementing mulitple mappers and single reducer in hadoop

Question

2 answers

solution1 1 ACCPTED 2012-08-30 14:02:41

solution2 1 2012-08-30 14:03:28

solution1
1 ACCPTED 2012-08-30 14:02:41

solution2
1 2012-08-30 14:03:28