[英]BigQuery - Left Join two tables ON a column OR another
I want to perform a left join on two tables.我想对两个表执行左连接。 The field I will join by is an email address.
我将加入的字段是电子邮件地址。 As the table I on the left has two email fields which may be different, I want that if the joining by the first email fails and returns null values, perform a second join on the other email field.
由于左侧的表 I 有两个可能不同的电子邮件字段,我希望如果第一封电子邮件的加入失败并返回空值,则对另一个电子邮件字段执行第二次加入。 Lastly, I want to throw away those entries which have not been matched with any of the two joins.
最后,我想扔掉那些没有与两个连接中的任何一个匹配的条目。
I have thought of doing something like this:我曾想过做这样的事情:
SELECT *
FROM a
LEFT JOIN b
ON
a.address1 = b.email
OR a.address2 = b.email
However, this returned an error message saying LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join
.但是,这返回了一条错误消息,指出
LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join
。
What is the correct way of achieving this?实现这一目标的正确方法是什么?
You might be able to phrase your logic using a union:您也许可以使用联合来表达您的逻辑:
SELECT * FROM a LEFT JOIN b ON a.address1 = b.email
UNION
SELECT * FROM a LEFT JOIN b ON a.address2 = b.email;
OR
in JOIN
conditions makes is really hard to optimize queries. OR
在JOIN
条件下使得优化查询真的很困难。 So, BigQuery makes it hard to use OR
.因此,BigQuery 很难使用
OR
。
I think you can rephrase the query by doing:我认为您可以通过执行以下操作来重新表述查询:
SELECT *
FROM a CROSS JOIN
UNNEST(ARRAY[address1, address2]) address LEFT JOIN
b
ON address = b.email;
Consider below (BigQuery) approach考虑以下(BigQuery)方法
SELECT * EXCEPT(row_key, priority)
FROM (
SELECT *, TO_JSON_STRING(a) row_key, IF(b.email IS NULL, 3, 1) priority
FROM `project.dataset.tableA` a LEFT JOIN `project.dataset.tableB` b
ON a.address1 = b.email
UNION ALL
SELECT *, TO_JSON_STRING(a), IF(b.email IS NULL, 3, 2)
FROM `project.dataset.tableA` a LEFT JOIN `project.dataset.tableB` b
ON a.address2 = b.email
)
WHERE true
QUALIFY ROW_NUMBER() OVER(PARTITION BY row_key ORDER BY priority NULLS LAST) = 1
Or you can use less verbose / more generic (so you can avoid redundant code fragments which can be useful if you need more than just two conditions) version of above或者您可以使用不那么冗长/更通用(这样您可以避免冗余代码片段,如果您需要的不仅仅是两个条件,这可能很有用)上述版本
SELECT * EXCEPT(row_key, priority)
FROM (
SELECT a.*, b.*, TO_JSON_STRING(a) row_key, IF(b.email IS NULL, 3, c.priority) priority
FROM `project.dataset.tableA` a,
UNNEST([STRUCT(address1 as address, 1 as priority), (address2, 2)]) c
LEFT JOIN `project.dataset.tableB` b
ON c.address = b.email
)
WHERE TRUE
QUALIFY ROW_NUMBER() OVER(PARTITION BY row_key ORDER BY priority NULLS LAST) = 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.