简体   繁体   English

多个按键上的FULL OUTER JOIN

[英]FULL OUTER JOIN on multiple keys

I have two tables (A and B) that I want to merge on 3 fields (department_id, region_id, origin_id). 我有两个表(A和B),我想在3个字段(department_id,region_id,origin_id)上合并。 Each table contains a field for value. 每个表都包含一个值字段。 What I'd like to do is combine the two tables and get the difference between the two values. 我想做的是合并两个表并获得两个值之间的差。 The issue I'm having is that the composite field combination (department_id, region_id, origin_id do not have nulls) doesn't necessarily exist in both tables. 我遇到的问题是复合字段组合(department_id,region_id,origin_id不包含空值)不一定在两个表中都存在。 When I run the following query, I get many more records (~2x) than I expect, so I'm wondering if the query is wrong. 当我运行以下查询时,我得到的记录(〜2倍)多于我的预期,所以我想知道查询是否错误。 This is in Hive. 这是在Hive中。

SELECT
  COALESCE(A.department_id, B.department_id) AS department_id,
  COALESCE(A.region_id, B.region_id) AS region_id,
  COALESCE(A.origin_id, B.origin_id) AS origin_id,
  COALESCE(A.value, CAST(0 AS BIGINT)) - COALESCE(B.value, CAST(0 AS BIGINT)) AS delta_value
FROM
  A FULL OUTER JOIN B
  ON A.department_id = B.department_id
  AND A.region_id = B.region_id
  AND A.origin_id = B.origin_id

If you are getting more records then you expect, that is probably because there are duplicates in one or both tables. 如果您获得更多的记录,那么您可以期望,这可能是因为一个或两个表中都有重复项。 So, run these queries to see where the duplicates are: 因此,运行以下查询以查看重复项在哪里:

select department_id, region_id, origin_id, count(*)
from a
group by department_id, region_id, origin_id
having count(*) >= 2;

select department_id, region_id, origin_id, count(*)
from b
group by department_id, region_id, origin_id
having count(*) >= 2;

Any join will generate a Cartesian product for each combination of key values, if both tables have duplicates. 如果两个表都重复,则任何join都会为每个键值组合生成笛卡尔乘积。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM