简体   繁体   English

Hadoop - 如何使用和减少多个输入?

[英]Hadoop - how to use and reduce multiple inputs?

Mapper/Reducer 1 --> (key,value)
                      /   |   \
                     /    |    \
     Mapper/Reducer 2     |    Mapper/Reducer 4
     -> (oKey,oValue)     |    -> (xKey, xValue)
                          |
                          |
                    Mapper/Reducer 3
                    -> (aKey, aValue)

I have a logfile, which i aggregate with MR1. 我有一个日志文件,我与MR1聚合。 The Mapper2, Mapper3, Mapper4 takes the output of MR1 as their input. Mapper2,Mapper3,Mapper4将MR1的输出作为输入。 Jobs are chained. 乔布斯被束缚住了。

MR1 Output: MR1输出:

User     {infos of user:[{data here},{more data},{etc}]}
..

MR2 Output: MR2输出:

timestamp       idCount
..

MR3 Output: MR3输出:

timestamp        loginCount
..

MR4 Output: MR4输出:

timestamp        someCount
..

I want to combine the outputs from MR2-4 : Final output-> 我想结合MR2-4的输出:最终输出 - >

timestamp     idCount     loginCount   someCount
..
..
..

Is there a way w/o Pig or Hive? 没有猪或蜂巢的方式吗? I'm using Java. 我正在使用Java。

您可以使用MultipleInputs 在此处查看示例

As far as I know, you can't have array of output in reducer class. 据我所知,你不能在reducer类中有输出数组。 What comes to my mind to solve your problem is the following: 我想到的解决问题的方法如下:

Your output key for MR1 would be one of {a,b,c} and value is pair among {timestamp,idCount} or {timestamp, loginCount} or {timestamp, someCount} according to keys. 根据密钥,MR1的输出密钥将是{a,b,c}并且{timestamp,idCount}{timestamp, loginCount}{timestamp, someCount} And you will combine MR2-4 . 你将结合MR2-4

So the process is going to be like that: 所以这个过程就是这样的:

MR1 <inputKey,inputValue,outputKey,outPutValue> where outputKey is 
                                       "a" for outValue`{timestamp,idCount}
                                       "b" for outValue`{timestamp, loginCount} 
                                       "c" for outValue`{timestamp, someCount} 

MR2-4<inputKey,inputValue,outputKey,outPutValue> if inputkey is "a" do MR2
                                                 if inputkey is "b" do MR3
                                                 if inputkey is "c" do MR4

Also, there are methods called Partitioner and GroupComperator in which you can play with {key/value} and mapper/reducer can consider key+some_part_of_value as key. 此外,还有一些名为Partitioner and GroupComperator方法,您可以在其中使用{key / value},mapper / reducer可以将key+some_part_of_value视为键。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM