简体   繁体   English

Big Query 匹配表之间的记录

[英]Big Query match records between tables

I have two tables:我有两张桌子:

TableA表A

ID          Gender               BeginDate      
034446         F          2016-01-15T00:00:00
034446         F          2020-02-17T00:00:00
035689         F          2016-01-14T00:00:00
035679         F          2016-01-18T00:00:00
045687         F          2020-05-21T00:00:00

TableB表B

ID            Gender        Date        
34446         F          2016-01-14
35689         F          2016-01-14
35679         F          2016-01-18

I'm trying to figure out how many records(and which ones)from TableB do not match up with Table A as well as how many do.我试图弄清楚 TableB 中有多少记录(以及哪些记录)与表 A 不匹配以及有多少记录匹配。 Both tables have duplicate ID's which is why I need to also use the Date field to match up records.两个表都有重复的 ID,这就是为什么我还需要使用 Date 字段来匹配记录的原因。 The date field from TableB might be a day or two off between tables.表 B 中的日期字段在表之间可能相差一两天。 For example, the first row of Table B should match with the first row of Table A, not the second row.例如,表 B 的第一行应该与表 A 的第一行匹配,而不是第二行。 There needs to be a statement saying TableA date field is either equal to or between a day or two from the date field of TableB.需要有一个声明说 TableA 日期字段等于或介于 TableB 的日期字段的一两天之间。 I have attempted to write a query below but have the dates set equal to eachother.我试图在下面写一个查询,但将日期设置为彼此相等。

SELECT a.ID, CONCAT('0',CAST(b.ID AS STRING)), EXTRACT(DATE FROM a.BeginDate) AS date
FROM `dev.tableA` a
LEFT OUTER JOIN `dev.TableB` b
ON a.ID = CONCAT('0',CAST(b.ID AS STRING))
AND EXTRACT(DATE FROM a.BeginDate) = b.Date

If I follow you correctly, you can use exists :如果我没听错,你可以使用exists

select b.*
from tableb b
where not exists (
    select 1
    from tablea a 
    where a.id = b.id
        and a.begindate >= date_sub(b.date, interval 2 day)
        and a.begindate <= date_add(b.date, interval 2 day)
)

This brings records of b for which no match exists in a with the same id and a date that falls within +/- two days.这带来了b的记录,在 a 中不存在a相同id的匹配项和落在 +/- 两天内的日期。 You can add the casting on id , if that's really necessary, and adjust the boundaries as needed.如果确实有必要,您可以在id上添加强制转换,并根据需要调整边界。

If you want an overall record count, you can just replace b.* with count(*) .如果您想要总记录数,只需将b.*替换为count(*)即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM