[英]How to perform reduce side join on Java Mapreduce with two table which have many-to-many relationship?
First of all, I am not sure whether it is possible or not. 首先,我不确定是否可行。 If it is possible, I am still not sure whether it is the correct way of doing it. 如果可能的话,我仍然不确定这是否是正确的方法。
What I have is: 我所拥有的是:
What I want is: 我想要的是:
The problem I have is: 我有的问题是:
If there is a many-to-many relationship between the two files on join keys, how can I perform this with Hadoop Mapreduce on Java? 如果联接密钥上的两个文件之间存在多对多关系,如何在Java上使用Hadoop Mapreduce执行此操作?
As you can see from the illustration below, A has 4 matching row for a1=x and B has 2 matching row for b1=x. 从下图可以看到,A对于a1 = x有4个匹配行,而B对于b1 = x有2个匹配行。 Thus, joining the two tables on a1=b1=x produce 4*2 = 8 rows(combinations) as it is shown on the last table. 因此,将两个表合并为a1 = b1 = x会产生4 * 2 = 8行(组合),如最后一个表所示。 With a reduce side join, I could not manage to do that because this means increasing the key and value pairs which is against the nature of MapReduce. 使用减少的侧连接,我无法做到这一点,因为这意味着增加键和值对,这与MapReduce的本质背道而驰。
How can I perform such a thing? 我该怎么做?
Why it is a problem is: 为什么会出现问题是:
Let's say the table A is: 假设表A为:
a1 a2 a3 a4
x 1 somevalue somevalue
x 2 somevalue somevalue
x 3 somevalue somevalue
x 4 somevalue somevalue
Let's say the table B is: 假设表B为:
b1 b2 b3 b4 b5
x i somevalue somevalue somevalue
x j somevalue somevalue somevalue
The result of joining two files on a1=b1: 在a1 = b1上连接两个文件的结果:
a1 a2 b2
x 1 i
x 2 i
x 3 i
x 4 i
x 1 j
x 2 j
x 3 j
x 4 j
A full join will always produce M x N
output values for each key. 完全联接将始终为每个键产生M x N
输出值。
Note that, with a reduce side join, the number of intermediate keys pairs as emitted by the mappers would still be N + M
and it is the reducer who does the Cartesian product. 请注意,在使用减少侧连接的情况下,由映射器发出的中间键对的数量仍将为N + M
,并且由减少器执行笛卡尔乘积。 So there is nothing wrong about that. 因此,这没有错。 Since you control the reducer, you can do further filtering and output only what you need. 由于您控制减速器,因此可以进行进一步的过滤并仅输出所需的内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.