简体   繁体   English

BigQuery - 在一个列或另一个列上左连接两个表

[英]BigQuery - Left Join two tables ON a column OR another

I want to perform a left join on two tables.我想对两个表执行左连接。 The field I will join by is an email address.我将加入的字段是电子邮件地址。 As the table I on the left has two email fields which may be different, I want that if the joining by the first email fails and returns null values, perform a second join on the other email field.由于左侧的表 I 有两个可能不同的电子邮件字段,我希望如果第一封电子邮件的加入失败并返回空值,则对另一个电子邮件字段执行第二次加入。 Lastly, I want to throw away those entries which have not been matched with any of the two joins.最后,我想扔掉那些没有与两个连接中的任何一个匹配的条目。

I have thought of doing something like this:我曾想过做这样的事情:

SELECT *
FROM a
LEFT JOIN b
ON
    a.address1 = b.email
    OR a.address2 = b.email

However, this returned an error message saying LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join .但是,这返回了一条错误消息,指出LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join

What is the correct way of achieving this?实现这一目标的正确方法是什么?

You might be able to phrase your logic using a union:您也许可以使用联合来表达您的逻辑:

SELECT * FROM a LEFT JOIN b ON a.address1 = b.email
UNION
SELECT * FROM a LEFT JOIN b ON a.address2 = b.email;

OR in JOIN conditions makes is really hard to optimize queries. ORJOIN条件下使得优化查询真的很困难。 So, BigQuery makes it hard to use OR .因此,BigQuery 很难使用OR

I think you can rephrase the query by doing:我认为您可以通过执行以下操作来重新表述查询:

SELECT *
FROM a CROSS JOIN
     UNNEST(ARRAY[address1, address2]) address LEFT JOIN
     b
     ON address = b.email;

Consider below (BigQuery) approach考虑以下(BigQuery)方法

SELECT * EXCEPT(row_key, priority)
FROM (
  SELECT *, TO_JSON_STRING(a) row_key, IF(b.email IS NULL, 3, 1) priority
  FROM `project.dataset.tableA` a LEFT JOIN `project.dataset.tableB` b
  ON a.address1 = b.email
  UNION ALL
  SELECT *, TO_JSON_STRING(a), IF(b.email IS NULL, 3, 2) 
  FROM `project.dataset.tableA` a LEFT JOIN `project.dataset.tableB` b 
  ON a.address2 = b.email
)
WHERE true 
QUALIFY ROW_NUMBER() OVER(PARTITION BY row_key ORDER BY priority NULLS LAST) = 1     

Or you can use less verbose / more generic (so you can avoid redundant code fragments which can be useful if you need more than just two conditions) version of above或者您可以使用不那么冗长/更通用(这样您可以避免冗余代码片段,如果您需要的不仅仅是两个条件,这可能很有用)上述版本

SELECT * EXCEPT(row_key, priority)
FROM (
  SELECT a.*, b.*, TO_JSON_STRING(a) row_key, IF(b.email IS NULL, 3, c.priority) priority
  FROM `project.dataset.tableA` a, 
  UNNEST([STRUCT(address1 as address, 1 as priority), (address2, 2)]) c
  LEFT JOIN `project.dataset.tableB` b
  ON c.address = b.email
)
WHERE TRUE 
QUALIFY ROW_NUMBER() OVER(PARTITION BY row_key ORDER BY priority NULLS LAST) = 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM