SQL内部联接返回重复项

Question

I have the following 2 tables: 我有以下2表：

tab1 with 37146 rows 具有37146行的tab1

week_ref with 730 rows 730行的week_ref

All I want to do is join those tables on year and week so that the first week day and last week day will display next to the columns of the first table. 我要做的就是在年和周上连接这些表，以便第一个工作日和最后一个工作日将显示在第一个表的列旁边。

Below is my query: 以下是我的查询：

SELECT tab1.year
      ,tab1.week
      ,tab1.col3
      ,tab1.col4
      ,tab1.col5
      ,tab1.col6
      ,tab1.total
      ,tab1.col7
      ,week_ref.first_week_day
      ,week_ref.last_week_day

FROM dtsetname.tab1

JOIN spyros.week_ref ON (week_ref.year = tab1.year AND week_ref.week = tab1.week)

The return of the query returns the 2 extra columns but the rows are 255535. So it is full of duplicates. 查询的返回返回2个额外的列，但行为255535。因此它充满了重复项。 I used to get how join works, but I guess not anymore xd... Any help on this? 我过去常常了解join的工作原理，但我想不再是xd了……对此有任何帮助吗？ The correct output table should only give me 37146 rows since I only just want to add 2 extra columns. 正确的输出表应该只给我37146行，因为我只想添加2个额外的列。

Thanks 谢谢

Answer 1

The problem is that your week_ref table has a row for each day rather than per week. 问题是您的week_ref表每天而不是每周都有一行。

You can select just one day. 您只能选择一天。 If you have a weekday number or name (which I'm guessing that you do), that can be used: 如果您有一个工作日的电话号码或姓名（我想您是这样），则可以使用：

FROM dtsetname.tab1 JOIN
     spyros.week_ref wr
     ON wr.year = tab1.year AND
        wr.week = tab1.week AND
        wr.dayname = 'Monday'

If such a column is not available, then you can either extract() the information or aggregate: 如果此类列不可用，则可以extract()信息或进行汇总：

FROM dtsetname.tab1 JOIN
     (SELECT ANY_VALUE(wr).*
      FROM spyros.week_ref wr
      GROUP BY wr.year, wr.week
     ) wr
     ON wr.year = tab1.year AND
        wr.week = tab1.week

Answer 2

Below is for BigQuery Standard SQL 以下是BigQuery标准SQL

Before JOIN'ing you just need to dedup data in week_ref table as in below example 在加入之前，您只需要在week_ref表中删除数据，如下例所示

#standardSQL
SELECT tab1.year
      ,tab1.week
      ,tab1.col3
      ,tab1.col4
      ,tab1.col5
      ,tab1.col6
      ,tab1.total
      ,tab1.col7
      ,week_ref.first_week_day
      ,week_ref.last_week_day
FROM dtsetname.tab1 tab1
JOIN (SELECT DISTINCT year, week, first_week_day, last_week_day FROM spyros.week_ref) week_ref
ON (week_ref.year = tab1.year AND week_ref.week = tab1.week)

Answer 3

first, I hope that year+week & year+day are primary keys in corresponding tables, otherwise the problem is there. 首先，我希望year + week和year + day是相应表中的主键，否则问题就出现了。

If so, here is another hint to check: I notice that you join them by year and week, however, in the first table I see many 52 in a week column and in the second one 0 as a value. 如果是这样，这是另一个需要检查的提示：我注意到您按年和周将它们加入，但是，在第一个表中，我在一个星期列中看到许多52，在第二个表中看到一个0。

There are only 52 weeks in year, plus a day, so is it possible you need to join by 一年只有52周，加上一天，所以您可能需要参加

week_ref.year = tab1.year AND week_ref.week = tab1.week+1

Answer 4

I think the solutions mentioned by others should work if you are looking to join to your reference table to get week start/end dates. 我认为，如果您希望加入参考表以获取星期开始/结束日期，其他人提到的解决方案应该会起作用。

However, if you think your tab1 table has definite values in the week and year columns (and if I understand your data correctly) you can avoid the join altogether to get your desired results: 但是，如果您认为tab1表在“ week和“ year列中具有确定的值（并且如果我正确理解了您的数据），则可以完全避免合并以获得所需的结果：

select 
  year
  ,week
  ,col3
  ,col4
  ,col5
  ,col6
  ,total
  ,col7
  ,date_sub(weekdate, interval IF(EXTRACT(DAYOFWEEK FROM weekdate) = 1, 6, EXTRACT(DAYOFWEEK FROM weekdate) - 1) day) as first_week_day
  ,date_add(date_sub(weekdate, interval IF(EXTRACT(DAYOFWEEK FROM weekdate) = 1, 6, EXTRACT(DAYOFWEEK FROM weekdate) - 1) day), interval 6 day) as last_week_day
from (  
  select 
     tab1.year
    ,tab1.week
    ,tab1.col3
    ,tab1.col4
    ,tab1.col5
    ,tab1.col6
    ,tab1.total
    ,tab1.col7
    date_add(date(cast(tab1.year as int64), 1, 1), interval cast(tab1.week as int64) week) as weekdate
  from `mydataset.tab1` as tab1
)

Hope it helps :) 希望能帮助到你：）

SQL内部联接返回重复项

问题描述

4 个解决方案

解决方案1
1 2019-07-15 14:45:07

解决方案2
1 已采纳 2019-07-15 17:55:48

解决方案3
0 2019-07-15 14:47:57

解决方案4
0 2019-07-16 04:11:03

SQL内部联接返回重复项

问题描述

4 个解决方案

解决方案1 1 2019-07-15 14:45:07

解决方案2 1 已采纳 2019-07-15 17:55:48

解决方案3 0 2019-07-15 14:47:57

解决方案4 0 2019-07-16 04:11:03

解决方案1
1 2019-07-15 14:45:07

解决方案2
1 已采纳 2019-07-15 17:55:48

解决方案3
0 2019-07-15 14:47:57

解决方案4
0 2019-07-16 04:11:03