简体   繁体   中英

Hadoop - how to use and reduce multiple inputs?

Mapper/Reducer 1 --> (key,value)
                      /   |   \
                     /    |    \
     Mapper/Reducer 2     |    Mapper/Reducer 4
     -> (oKey,oValue)     |    -> (xKey, xValue)
                          |
                          |
                    Mapper/Reducer 3
                    -> (aKey, aValue)

I have a logfile, which i aggregate with MR1. The Mapper2, Mapper3, Mapper4 takes the output of MR1 as their input. Jobs are chained.

MR1 Output:

User     {infos of user:[{data here},{more data},{etc}]}
..

MR2 Output:

timestamp       idCount
..

MR3 Output:

timestamp        loginCount
..

MR4 Output:

timestamp        someCount
..

I want to combine the outputs from MR2-4 : Final output->

timestamp     idCount     loginCount   someCount
..
..
..

Is there a way w/o Pig or Hive? I'm using Java.

您可以使用MultipleInputs 在此处查看示例

As far as I know, you can't have array of output in reducer class. What comes to my mind to solve your problem is the following:

Your output key for MR1 would be one of {a,b,c} and value is pair among {timestamp,idCount} or {timestamp, loginCount} or {timestamp, someCount} according to keys. And you will combine MR2-4 .

So the process is going to be like that:

MR1 <inputKey,inputValue,outputKey,outPutValue> where outputKey is 
                                       "a" for outValue`{timestamp,idCount}
                                       "b" for outValue`{timestamp, loginCount} 
                                       "c" for outValue`{timestamp, someCount} 

MR2-4<inputKey,inputValue,outputKey,outPutValue> if inputkey is "a" do MR2
                                                 if inputkey is "b" do MR3
                                                 if inputkey is "c" do MR4

Also, there are methods called Partitioner and GroupComperator in which you can play with {key/value} and mapper/reducer can consider key+some_part_of_value as key.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM