简体   繁体   中英

Implementing mulitple mappers and single reducer in hadoop

I am new to hadoop. I have mutiple folders containing files to processing a data in hadoop. I have doubt to implement mapper in map-reducer algorithm. Can I specify multiple mappers for processing mulitple files and have all input files as one output using a single reducer? If possible, please give guidelines for implementing the above steps.

If you have multiple files, use MultipleInputs

addInputPath() method can be used to:

  1. add multiple paths and one common mapper implementation
  2. add multiple paths with custom mapper and input format implementation.

For having a single reducer, have each maps' output key same...say 1 or "abc". This way, the framework will create only one reducer.

If the files are to be mapped in the same way (eg they all have the same format and processing requirements) then you can configure a single mapper to process all of them.

You do this by configuring the TextInputFormat class:

string folder1 = "file:///home/chrisgerken/blah/blah/folder1";
string folder2 = "file:///home/chrisgerken/blah/blah/folder2";
string folder3 = "file:///home/chrisgerken/blah/blah/folder3";
TextInputFormat.setInputPaths(job, new Path(folder1), new Path(folder2), new Path(folder3));

This will result in all of the files in folders 1, 2 and 3 being processed by the mapper.

Of course, if you need to use a different input type you'll have to configure that type appropriately.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM