[英]Performance comparison on Join with OR on 2 predicates vs 2 separate joins 1 predicate each
What's the performance impact on using a join using 2 predicates with an OR on the ON clause like so: 像这样在ON子句上使用带有OR的2个谓词的联接对性能有何影响:
SELECT GS.GuitarType,GD,GuitarColor
FROM Prod.Guitars GS
LEFT JOIN Prod.Guitar_Detail GD ON (GS.GuitarID = GD.GuitarID OR GS.GuitarID = GD.GuitarCatNum)
VS. VS。 something like this:
像这样的东西:
SELECT GS.GuitarType,GD,GuitarColor
FROM Prod.Guitars GS
LEFT JOIN Prod.Guitar_Detail GD ON GS.GuitarID = GD.GuitarID
LEFT JOIN Prod.Guitar_Detail GD2 ON GS.GuitarID = GD.GuitarCatNum
Couple caveats: We have to use LEFT JOIN can't use INNER. 夫妻警告:我们必须使用LEFT JOIN不能使用INNER。 I've ran both of the queries and the latter performs better.
我已经运行了两个查询,后者的性能更好。
Also another question, the 2nd won't return more rows right? 还有另一个问题,第二个不会返回更多的行吗? Because they're both being joined on the same table, they should both preserve the GS table only right?
因为它们都被连接在同一个表上,所以它们都应该只保留GS表,对吗?
In the first query does it have to match twice? 在第一个查询中,它必须匹配两次吗? Or why does it perform different than the second?
还是为什么它的表现与第二种不同?
Let me answer in a reversed order. 让我以相反的顺序回答。
Also another question, the 2nd won't return more rows right?
还有另一个问题,第二个不会返回更多的行吗? Because they're both being joined on the same table, they should both preserve the GS table only right?
因为它们都被连接在同一个表上,所以它们都应该只保留GS表,对吗?
The queries are different (the difference being in how nulls are treated), and the different execution times should be expected. 查询是不同的(区别在于如何处理空值),并且应该预期不同的执行时间。 Everything boils down to how GD.GuitarID and GD.GuitarCatNum are used.
一切归结为如何使用GD.GuitarID和GD.GuitarCatNum。
a) If GD.GuitarID is set and GD.GuitarCatNum null, the queries will return the same data. a)如果设置了GD.GuitarID且GD.GuitarCatNum为空,则查询将返回相同的数据。
b) If GD.GuitarID is set and GD.GuitarCatNum contains the same value as GD.GuitarID, the second query will return duplicate rows. b)如果设置了GD.GuitarID,并且GD.GuitarCatNum包含与GD.GuitarID相同的值,则第二个查询将返回重复的行。
c) If GD.GuitarID is null and GD.GuitarCatNum set, the queries will return the same number of rows, but GD.GuitarColor will be returned as null. c)如果GD.GuitarID为null且设置了GD.GuitarCatNum,则查询将返回相同的行数,但GD.GuitarColor将返回为空。
Now, assuming case a), the execution plans look like this: 现在,假设情况为a),执行计划如下所示:
Case 1) 情况1)
SELECT
GS.GuitarType,
GD.GuitarColor
FROM
Guitars GS
LEFT JOIN Guitar_Detail GD
ON (GS.GuitarID = GD.GuitarID OR
GS.GuitarID = GD.GuitarCatNum)
Access Plan:
-----------
Total Cost: 18.3602
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
3
>NLJOIN
( 2)
18.3602
2
/-----+------\
2 1.5
TBSCAN TBSCAN
( 3) ( 4)
8.99536 9.07676
1 1
| |
2 2
TABLE: DB2INST1 TABLE: DB2INST1
GUITARS GUITAR_DETAIL
Q2 Q1
Case 2) 情况2)
SELECT
GS.GuitarType,
GD.GuitarColor
FROM
Guitars GS
LEFT JOIN Guitar_Detail GD
ON GS.GuitarID = GD.GuitarID
LEFT JOIN Guitar_Detail GD2
ON GS.GuitarID = GD.GuitarCatNum
Total Cost: 27.2798
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
2
>NLJOIN
( 2)
27.2798
3
/--------------+---------------\
2 1
HSJOIN< NLJOIN
( 3) ( 6)
18.0326 9.01796
2 1
/-----+------\ /-----+------\
2 2 0.5 2
TBSCAN TBSCAN TBSCAN TBSCAN
( 4) ( 5) ( 7) ( 8)
8.99536 8.99536 0.0226 8.99536
1 1 0 1
| | | |
2 2 1 2
TABLE: DB2INST1 TABLE: DB2INST1 TABFNC: SYSIBM TABLE: DB2INST1
GUITAR_DETAIL GUITARS GENROW GUITAR_DETAIL
Q2 Q1 Q4 Q6
Hope this helps. 希望这可以帮助。
OR usually performs badly especially in joins. OR通常表现不佳,尤其是在连接中。 It is best to design your database so that you don't need these types of joins.
最好设计数据库,这样就不需要这些类型的联接。
However, we are all stuck with the design at times, in that case, it is often more performant to use a UNION ALL (if the two join fields are mutually exlcusive). 但是,有时我们都会被设计困住,在这种情况下,使用UNION ALL(如果两个联接字段互斥)通常会更高效。 A UNION would be slower but better if the fields are not mutually exclusive and you don't want duplicates.
如果字段不是互斥的,并且您不希望重复,那么UNION会更慢但更好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.