Can reducer take multiple inputs from different mappers?

Question

Can reducer take multiple inputs ? The key is same for both mappers but value types are different. First is a MapReduce program that outputs (text, floatwritable) where floatwritable is the value type and text is key. Second is a mapper which outputs (text, SongStats) where SongStats is a custom data type implementing Writable. I want a reducer to take output of both earlier map reduce and the second mapper while running calculations. I think the output of first MapReduce program may be too big to hold in distributed cache. Any pointers would help. I am writing programs in Java.

Answer 1

No, a reducer can only take in a specific input as defined in the method definition:

public void reduce(Key key, Iterable<IntWritable> values,
                  Context context) throws IOException, InterruptedException {

Your best bet is to write a new MapReduce job that uses MultipleInputs to convert the output of the previous MapReduce job (that had text, floatwritable) and the output of your other Mapper (that had text, SongStats) to a similar type (eg text, text or whatever can best suit your needs) and pass that to your reducer.

Sources:

https://hadoop.apache.org/docs/r2.9.0/api/org/apache/hadoop/mapreduce/Reducer.html

https://hadoop.apache.org/docs/r2.6.3/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html

Can reducer take multiple inputs from different mappers?

Question

1 answers

solution1
0 ACCPTED 2018-05-14 14:59:31

Can reducer take multiple inputs from different mappers?

Question

1 answers

solution1 0 ACCPTED 2018-05-14 14:59:31

solution1
0 ACCPTED 2018-05-14 14:59:31