Mapper/Reducer 1 --> (key,value)
/ | \
/ | \
Mapper/Reducer 2 | Mapper/Reducer 4
-> (oKey,oValue) | -> (xKey, xValue)
|
|
Mapper/Reducer 3
-> (aKey, aValue)
I have a logfile, which i aggregate with MR1. The Mapper2, Mapper3, Mapper4 takes the output of MR1 as their input. Jobs are chained.
MR1 Output:
User {infos of user:[{data here},{more data},{etc}]}
..
MR2 Output:
timestamp idCount
..
MR3 Output:
timestamp loginCount
..
MR4 Output:
timestamp someCount
..
I want to combine the outputs from MR2-4 : Final output->
timestamp idCount loginCount someCount
..
..
..
Is there a way w/o Pig or Hive? I'm using Java.
您可以使用MultipleInputs 在此处查看示例
As far as I know, you can't have array of output in reducer class. What comes to my mind to solve your problem is the following:
Your output key for MR1 would be one of {a,b,c}
and value is pair among {timestamp,idCount}
or {timestamp, loginCount}
or {timestamp, someCount}
according to keys. And you will combine MR2-4 .
So the process is going to be like that:
MR1 <inputKey,inputValue,outputKey,outPutValue> where outputKey is
"a" for outValue`{timestamp,idCount}
"b" for outValue`{timestamp, loginCount}
"c" for outValue`{timestamp, someCount}
MR2-4<inputKey,inputValue,outputKey,outPutValue> if inputkey is "a" do MR2
if inputkey is "b" do MR3
if inputkey is "c" do MR4
Also, there are methods called Partitioner and GroupComperator
in which you can play with {key/value} and mapper/reducer can consider key+some_part_of_value
as key.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.