[英]SQL Inner Join returns duplicates
I have the following 2 tables: 我有以下2表:
tab1
with 37146 rows 具有37146行的
tab1
week_ref
with 730 rows 730行的
week_ref
All I want to do is join those tables on year and week so that the first week day and last week day will display next to the columns of the first table. 我要做的就是在年和周上连接这些表,以便第一个工作日和最后一个工作日将显示在第一个表的列旁边。
Below is my query: 以下是我的查询:
SELECT tab1.year
,tab1.week
,tab1.col3
,tab1.col4
,tab1.col5
,tab1.col6
,tab1.total
,tab1.col7
,week_ref.first_week_day
,week_ref.last_week_day
FROM dtsetname.tab1
JOIN spyros.week_ref ON (week_ref.year = tab1.year AND week_ref.week = tab1.week)
The return of the query returns the 2 extra columns but the rows are 255535. So it is full of duplicates. 查询的返回返回2个额外的列,但行为255535。因此它充满了重复项。 I used to get how join works, but I guess not anymore xd... Any help on this?
我过去常常了解join的工作原理,但我想不再是xd了……对此有任何帮助吗? The correct output table should only give me 37146 rows since I only just want to add 2 extra columns.
正确的输出表应该只给我37146行,因为我只想添加2个额外的列。
Thanks 谢谢
The problem is that your week_ref
table has a row for each day rather than per week. 问题是您的
week_ref
表每天而不是每周都有一行。
You can select just one day. 您只能选择一天。 If you have a weekday number or name (which I'm guessing that you do), that can be used:
如果您有一个工作日的电话号码或姓名(我想您是这样),则可以使用:
FROM dtsetname.tab1 JOIN
spyros.week_ref wr
ON wr.year = tab1.year AND
wr.week = tab1.week AND
wr.dayname = 'Monday'
If such a column is not available, then you can either extract()
the information or aggregate: 如果此类列不可用,则可以
extract()
信息或进行汇总:
FROM dtsetname.tab1 JOIN
(SELECT ANY_VALUE(wr).*
FROM spyros.week_ref wr
GROUP BY wr.year, wr.week
) wr
ON wr.year = tab1.year AND
wr.week = tab1.week
Below is for BigQuery Standard SQL 以下是BigQuery标准SQL
Before JOIN'ing you just need to dedup data in week_ref table as in below example 在加入之前,您只需要在week_ref表中删除数据,如下例所示
#standardSQL
SELECT tab1.year
,tab1.week
,tab1.col3
,tab1.col4
,tab1.col5
,tab1.col6
,tab1.total
,tab1.col7
,week_ref.first_week_day
,week_ref.last_week_day
FROM dtsetname.tab1 tab1
JOIN (SELECT DISTINCT year, week, first_week_day, last_week_day FROM spyros.week_ref) week_ref
ON (week_ref.year = tab1.year AND week_ref.week = tab1.week)
first, I hope that year+week & year+day are primary keys in corresponding tables, otherwise the problem is there. 首先,我希望year + week和year + day是相应表中的主键,否则问题就出现了。
If so, here is another hint to check: I notice that you join them by year and week, however, in the first table I see many 52 in a week column and in the second one 0 as a value. 如果是这样,这是另一个需要检查的提示:我注意到您按年和周将它们加入,但是,在第一个表中,我在一个星期列中看到许多52,在第二个表中看到一个0。
There are only 52 weeks in year, plus a day, so is it possible you need to join by 一年只有52周,加上一天,所以您可能需要参加
week_ref.year = tab1.year AND week_ref.week = tab1.week+1
I think the solutions mentioned by others should work if you are looking to join to your reference table to get week start/end dates. 我认为,如果您希望加入参考表以获取星期开始/结束日期,其他人提到的解决方案应该会起作用。
However, if you think your tab1
table has definite values in the week
and year
columns (and if I understand your data correctly) you can avoid the join altogether to get your desired results: 但是,如果您认为
tab1
表在“ week
和“ year
列中具有确定的值(并且如果我正确理解了您的数据),则可以完全避免合并以获得所需的结果:
select
year
,week
,col3
,col4
,col5
,col6
,total
,col7
,date_sub(weekdate, interval IF(EXTRACT(DAYOFWEEK FROM weekdate) = 1, 6, EXTRACT(DAYOFWEEK FROM weekdate) - 1) day) as first_week_day
,date_add(date_sub(weekdate, interval IF(EXTRACT(DAYOFWEEK FROM weekdate) = 1, 6, EXTRACT(DAYOFWEEK FROM weekdate) - 1) day), interval 6 day) as last_week_day
from (
select
tab1.year
,tab1.week
,tab1.col3
,tab1.col4
,tab1.col5
,tab1.col6
,tab1.total
,tab1.col7
date_add(date(cast(tab1.year as int64), 1, 1), interval cast(tab1.week as int64) week) as weekdate
from `mydataset.tab1` as tab1
)
Hope it helps :) 希望能帮助到你 :)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.