简体   繁体   English

bigquery 中的多个内部联接导致重复的行

[英]multiple inner joins in bigquery result in duplicated rows

Im tryin to join 6 tables in bigquery named T0, T1, T2, T3, T4, T5 The tables result im interested are T0 and T1 after query this tables I got 43 matches我尝试在名为 T0、T1、T2、T3、T4、T5 的 bigquery 中加入 6 个表 我感兴趣的表结果是 T0 和 T1 查询此表后我得到了 43 个匹配项

SELECT  
        T1.F1, 
        T0.F2, 
        T0.F3, 
        T0.F4, 
        T1.F5,
        T1.F6,
        T1.F7,
        T1.F8
        T0.F9
        FROM `TABLE0` T0
        INNER JOIN `TABLE1` T1 on T1.F1= T0.F1
        WHERE T0.F1 = "010001476713" 
        AND T0.F2 = T1.F2
        ORDER BY T0.F4

But when I run this with multiple INNER JOIN I got 800 results not the 43, results are duplicated但是当我使用多个 INNER JOIN 运行它时,我得到了 800 个结果而不是 43,结果重复

SELECT
T2.F11,
T3.F15,
T2.F12,
T3.F16,
T3.F17,
T1.F1, 
T2.F13,
T3.F17,
T5.F18,
T5.F19,
T5.F20,
T2.F14,
T0.F9,
T1.F10,
T4.F3,
T4.F21,
T4.F22,
T0.F2, 
T3.F23,
T0.F3, 
T0.F4, 
T1.F5,
T1.F6,
T1.F7,
T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1
INNER JOIN `TABLE3`  T3 ON T3.F1=T1.F1
INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713" 
AND T0.F2 = T1.F2
ORDER BY T0.F4

When I get duplicate rows, I solve it like this:当我得到重复的行时,我会这样解决它:

You get 43 results on your inner join of table T0 & T1.您在表 T0 和 T1 的内部连接中获得 43 个结果。 So far so good.到目前为止,一切都很好。

Now comment out everything related to table T2, T4, & T5 (I've placed the commas at the beginning of the row for easier commenting out) like this现在像这样注释掉与表 T2、T4 和 T5 相关的所有内容(我已将逗号放在行首以便于注释掉)

SELECT
--T2.F11,
T3.F15
--,T2.F12
,T3.F16
,T3.F17
,T1.F1 
--,T2.F13
,T3.F17
--,T5.F18
--,T5.F19
--,T5.F20
--,T2.F14
,T0.F9
,T1.F10
--,T4.F3
--,T4.F21
--,T4.F22
,T0.F2 
,T3.F23
,T0.F3 
,T0.F4 
,T1.F5
,T1.F6
,T1.F7
,T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1 and T0.F2 = T1.F2
INNER JOIN `TABLE3`  T3 ON T3.F1=T1.F1
--INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
--INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
--INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713" 
ORDER BY T0.F4

I've moved the and T0.F2 = T1.F2 from the where to on in the inner join.我已经将 and T0.F2 = T1.F2 从 where 移到了内部连接中。 When you run this query, do you still get 43 rows, or more?当你运行这个查询时,你仍然得到 43 行,还是更多? If more, you need to figure out what it is double matching on, and add that to your on statement.如果更多,您需要弄清楚它是双重匹配的,并将其添加到您的 on 语句中。 You may need to comment out your select statement and select all to really figure it out, like this:您可能需要注释掉您的 select 语句和 select 才能真正弄清楚,如下所示:

SELECT *
/*
--T2.F11,
T3.F15
--,T2.F12
,T3.F16
,T3.F17
,T1.F1 
--,T2.F13
,T3.F17
--,T5.F18
--,T5.F19
--,T5.F20
--,T2.F14
,T0.F9
,T1.F10
--,T4.F3
--,T4.F21
--,T4.F22
,T0.F2 
,T3.F23
,T0.F3 
,T0.F4 
,T1.F5
,T1.F6
,T1.F7
,T1.F8
*/
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1 and T0.F2 = T1.F2
INNER JOIN `TABLE3`  T3 ON T3.F1=T1.F1
--INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
--INNER JOIN `TABLE4` T4 ON T4.F3 = T0.F3
--INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713" 
ORDER BY T0.F4

Once you figure out what rows are causing the duplication, you add an 'and' statement to the on clause to make it a 1-1, and then move on.一旦找出导致重复的行,就在 on 子句中添加一个“and”语句以使其成为 1-1,然后继续。 You then uncomment the parts of the query related to T2 and do the same thing, then T4 and then T5.然后取消注释与 T2 相关的查询部分并执行相同的操作,然后是 T4,然后是 T5。 If you send me the results of the query above, I can help you figure out what your on clause needs to be to keep it from duplicating.如果您将上述查询的结果发送给我,我可以帮助您弄清楚您的 on 子句需要什么才能防止重复。

thank you @jenstretman, I find table 4 to be duplicating matches by using a foreign Key with non-primary Key creating duplicates, the solution was to use a DISTINCT to only select specifically matched rows.谢谢@jenstretman,我发现表 4 是通过使用具有非主键的外键创建重复项来重复匹配项,解决方案是使用 DISTINCT 来仅 select 专门匹配的行。

SELECT DISTINCT
T2.F11,
T3.F15,
T2.F12,
T3.F16,
T3.F17,
T1.F1, 
T2.F13,
T3.F17,
T5.F18,
T5.F19,
T5.F20,
T2.F14,
T0.F9,
T1.F10,
T4.F3,
T4.F21,
T4.F22,
T0.F2, 
T3.F23,
T0.F3, 
T0.F4, 
T1.F5,
T1.F6,
T1.F7,
T1.F8
FROM `TABLE0` T0
INNER JOIN `TABLE1` T1 ON T1.F1= T0.F1
INNER JOIN `TABLE3`  T3 ON T3.F1=T1.F1
INNER JOIN `TABLE2` T2 ON T2.F24 = T3.F24
INNER JOIN (SELECT DISTINCT T4.F3, T4.F21, T4.F22, FROM `TABLE4` T4)T4 ON T4.F3 = T0.F3
INNER JOIN `TABLE5` as T5 ON T5.F1=T0.F1
WHERE T0.F1 = "010001476713" 
AND T0.F2 = T1.F2
ORDER BY T0.F4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM