简体   繁体   中英

Can reducer take multiple inputs from different mappers?

Can reducer take multiple inputs ? The key is same for both mappers but value types are different. First is a MapReduce program that outputs (text, floatwritable) where floatwritable is the value type and text is key. Second is a mapper which outputs (text, SongStats) where SongStats is a custom data type implementing Writable. I want a reducer to take output of both earlier map reduce and the second mapper while running calculations. I think the output of first MapReduce program may be too big to hold in distributed cache. Any pointers would help. I am writing programs in Java.

No, a reducer can only take in a specific input as defined in the method definition:

public void reduce(Key key, Iterable<IntWritable> values,
                  Context context) throws IOException, InterruptedException {

Your best bet is to write a new MapReduce job that uses MultipleInputs to convert the output of the previous MapReduce job (that had text, floatwritable) and the output of your other Mapper (that had text, SongStats) to a similar type (eg text, text or whatever can best suit your needs) and pass that to your reducer.

Sources:

https://hadoop.apache.org/docs/r2.9.0/api/org/apache/hadoop/mapreduce/Reducer.html

https://hadoop.apache.org/docs/r2.6.3/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM