简体   繁体   English

在2个谓词上使用OR进行连接的性能比较与2个单独的联接中的每个谓词的性能比较

[英]Performance comparison on Join with OR on 2 predicates vs 2 separate joins 1 predicate each

What's the performance impact on using a join using 2 predicates with an OR on the ON clause like so: 像这样在ON子句上使用带有OR的2个谓词的联接对性能有何影响:

 SELECT GS.GuitarType,GD,GuitarColor
 FROM Prod.Guitars GS
 LEFT JOIN Prod.Guitar_Detail GD ON (GS.GuitarID = GD.GuitarID OR GS.GuitarID  = GD.GuitarCatNum)

VS. VS。 something like this: 像这样的东西:

 SELECT GS.GuitarType,GD,GuitarColor
 FROM Prod.Guitars GS
 LEFT JOIN Prod.Guitar_Detail GD ON GS.GuitarID = GD.GuitarID 
 LEFT JOIN Prod.Guitar_Detail GD2 ON GS.GuitarID  = GD.GuitarCatNum

Couple caveats: We have to use LEFT JOIN can't use INNER. 夫妻警告:我们必须使用LEFT JOIN不能使用INNER。 I've ran both of the queries and the latter performs better. 我已经运行了两个查询,后者的性能更好。

Also another question, the 2nd won't return more rows right? 还有另一个问题,第二个不会返回更多的行吗? Because they're both being joined on the same table, they should both preserve the GS table only right? 因为它们都被连接在同一个表上,所以它们都应该只保留GS表,对吗?

In the first query does it have to match twice? 在第一个查询中,它必须匹配两次吗? Or why does it perform different than the second? 还是为什么它的表现与第二种不同?

Let me answer in a reversed order. 让我以相反的顺序回答。

Also another question, the 2nd won't return more rows right? 还有另一个问题,第二个不会返回更多的行吗? Because they're both being joined on the same table, they should both preserve the GS table only right? 因为它们都被连接在同一个表上,所以它们都应该只保留GS表,对吗?

The queries are different (the difference being in how nulls are treated), and the different execution times should be expected. 查询是不同的(区别在于如何处理空值),并且应该预期不同的执行时间。 Everything boils down to how GD.GuitarID and GD.GuitarCatNum are used. 一切归结为如何使用GD.GuitarID和GD.GuitarCatNum。

a) If GD.GuitarID is set and GD.GuitarCatNum null, the queries will return the same data. a)如果设置了GD.GuitarID且GD.GuitarCatNum为空,则查询将返回相同的数据。
b) If GD.GuitarID is set and GD.GuitarCatNum contains the same value as GD.GuitarID, the second query will return duplicate rows. b)如果设置了GD.GuitarID,并且GD.GuitarCatNum包含与GD.GuitarID相同的值,则第二个查询将返回重复的行。
c) If GD.GuitarID is null and GD.GuitarCatNum set, the queries will return the same number of rows, but GD.GuitarColor will be returned as null. c)如果GD.GuitarID为null且设置了GD.GuitarCatNum,则查询将返回相同的行数,但GD.GuitarColor将返回为空。

Now, assuming case a), the execution plans look like this: 现在,假设情况为a),执行计划如下所示:

Case 1) 情况1)

SELECT 
  GS.GuitarType,
  GD.GuitarColor 
FROM 
  Guitars GS 
  LEFT JOIN Guitar_Detail GD 
  ON (GS.GuitarID = GD.GuitarID OR 
      GS.GuitarID = GD.GuitarCatNum)

Access Plan:
-----------
    Total Cost:         18.3602
    Query Degree:       1

              Rows 
             RETURN
             (   1)
              Cost 
               I/O 
               |
                3 
             >NLJOIN
             (   2)
             18.3602 
                2 
         /-----+------\
        2               1.5 
     TBSCAN           TBSCAN
     (   3)           (   4)
     8.99536          9.07676 
        1                1 
       |                |
        2                2 
 TABLE: DB2INST1    TABLE: DB2INST1  
     GUITARS       GUITAR_DETAIL
       Q2               Q1

Case 2) 情况2)

SELECT 
  GS.GuitarType,
  GD.GuitarColor 
FROM 
  Guitars GS 
  LEFT JOIN Guitar_Detail GD 
  ON GS.GuitarID = GD.GuitarID 
  LEFT JOIN Guitar_Detail GD2 
  ON GS.GuitarID = GD.GuitarCatNum

    Total Cost:         27.2798
    Query Degree:       1

                               Rows 
                              RETURN
                              (   1)
                               Cost 
                                I/O 
                                |
                                 2 
                              >NLJOIN
                              (   2)
                              27.2798 
                                 3 
                 /--------------+---------------\
                2                                  1 
             HSJOIN<                            NLJOIN
             (   3)                             (   6)
             18.0326                            9.01796 
                2                                  1 
         /-----+------\                     /-----+------\
        2                2                0.5               2 
     TBSCAN           TBSCAN            TBSCAN           TBSCAN
     (   4)           (   5)            (   7)           (   8)
     8.99536          8.99536           0.0226           8.99536 
        1                1                 0                1 
       |                |                 |                |
        2                2                 1                2 
 TABLE: DB2INST1    TABLE: DB2INST1    TABFNC: SYSIBM    TABLE: DB2INST1  
  GUITAR_DETAIL       GUITARS           GENROW        GUITAR_DETAIL
       Q2               Q1                Q4               Q6

Hope this helps. 希望这可以帮助。

OR usually performs badly especially in joins. OR通常表现不佳,尤其是在连接中。 It is best to design your database so that you don't need these types of joins. 最好设计数据库,这样就不需要这些类型的联接。

However, we are all stuck with the design at times, in that case, it is often more performant to use a UNION ALL (if the two join fields are mutually exlcusive). 但是,有时我们都会被设计困住,在这种情况下,使用UNION ALL(如果两个联接字段互斥)通常会更高效。 A UNION would be slower but better if the fields are not mutually exclusive and you don't want duplicates. 如果字段不是互斥的,并且您不希望重复,那么UNION会更慢但更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM