SQL-查找一對行是否不存在的最有效方法

Question

我似乎在網上找不到類似的情況。 我有一個名為“訂單”的表，稱為“訂單”，還有一個有關這些訂單的詳細信息的表，稱為“訂單明細”。 某種類型的訂單的定義是，如果它具有兩對訂單明細（值-單位對）中的一對。 因此，我的訂單明細表可能如下所示：

order_id | detail
---------|-------
1        | X  
1        | Y
1        | Z
2        | X
2        | Z
2        | B
3        | A
3        | Z
3        | B

在一起的兩對是（X＆Y）和（A＆B）。 僅檢索那些不包含這些對中的任何一個的order_id的有效方法是什么？ 例如，對於上表，我只需要接收order_id 2。

我能想到的唯一解決方案實質上是使用兩個查詢並執行自連接：

select distinct o.order_id
from orders o
where o.order_id not in (
    select distinct order_id
    from order_detail od1 where od1.detail=X
    join order_detail od2 on od2.order_id = od1.order_id and od2.detail=Y
) 
and o.order_id not in (
    select distinct order_id
    from order_detail od1 where od1.detail=A
    join order_detail od2 on od2.order_id = od1.order_id and od2.detail=B
)

問題是性能是一個問題，我的order_detail表很大，而且我對查詢語言缺乏經驗。 有沒有一種更快的方法來降低基數呢？ 我對表的架構也有零控制，因此我無法在其中進行任何更改。

Answer 1

我將使用聚合並having ：

select order_id
from order_detail od
group by order_id
having sum(case when detail in ('X', 'Y') then 1 else 0 end) < 2 and
       sum(case when detail in ('A', 'B') then 1 else 0 end) < 2;

這假定訂單中沒有重復的行具有相同的detail 。 如果可能的話：

select order_id
from order_detail od
group by order_id
having count(distinct case when detail in ('X', 'Y') then detail end) < 2 and
       count(distinct case when detail in ('A', 'B') then detail end) < 2;

Answer 2

首先，我想強調的是，找到最有效的查詢是好的查詢和好的索引的結合。 我經常在這里看到這樣的問題：人們希望魔術只在一個或另一個中發生。

例如，在各種各樣的解決方案中，當沒有索引時，您的解決方案最慢（在修復語法錯誤之后），但在索引為(detail, order_id)

另請注意，您具有實際的數據和表結構。 您需要嘗試各種查詢和索引組合，才能找到最合適的方法。 尤其重要的是，由於您尚未說明正在使用的平台，並且結果可能會因平台而異。

[/ ranf斷]

詢問

事不宜遲，戈登·利諾夫（Gordon Linoff）提供了一些好的建議。 還有另一種可能提供類似性能的選項。 您說無法控制架構； 但是您可以使用子查詢將數據轉換為“更友好的結構”。

具體來說，如果您：

旋轉數據，以便每個order_id有一行
和您要檢查的每個detail列
交集是多少個訂單有該詳細信息的計數...

那么您的查詢就是： where (x=0 or y=0) and (a=0 or b=0) 。 下面使用SQL Server的臨時表來演示示例數據。 下面的查詢不管重復的id, val對如何工作。

/*Set up sample data*/
declare @t table (
    id int,
    val char(1)
)
insert @t(id, val)
values  (1, 'x'), (1, 'y'), (1, 'z'),
        (2, 'x'), (2, 'z'), (2, 'b'),
        (3, 'a'), (3, 'z'), (3, 'b')

/*Option 1 manual pivoting*/
select  t.id
from    (
        select  o.id,
                sum(case when o.val = 'a' then 1 else 0 end) as a,
                sum(case when o.val = 'b' then 1 else 0 end) as b,
                sum(case when o.val = 'x' then 1 else 0 end) as x,
                sum(case when o.val = 'y' then 1 else 0 end) as y
        from    @t o
        group by o.id
        ) t
where   (x = 0 or y = 0) and (a = 0 or b = 0)

/*Option 2 using Sql Server PIVOT feature*/
select  t.id
from    (
        select  id ,[a],[b],[x],[y]
        from    (select id, val from @t) src
                pivot (count(val) for val in ([a],[b],[x],[y])) pvt
        ) t
where   (x = 0 or y = 0) and (a = 0 or b = 0)

有趣的是，上面選項1和2的查詢計划略有不同。 這表明在大型數據集上具有不同性能特征的可能性。

索引

請注意，以上內容可能會處理整個表格。 因此，索引幾乎無濟於事。 但是，如果表具有“長行”，則僅在您正在使用的2列上建立索引，這意味着需要從磁盤讀取的數據更少。

您提供的查詢結構可能會受益於諸如(detail, order_id)類的索引。 這是因為服務器可以更有效地檢查NOT IN子查詢條件。 效益如何取決於表中數據的分布。

附帶說明一下，我測試了各種查詢選項，包括您和Gordon的固定版本。 （盡管只有很小的數據量。）

沒有上述索引，您的查詢在批處理中最慢。
使用上述索引，戈登的第二個查詢最慢。

替代查詢

您的查詢（固定）：

select distinct o.id
from @t o
where o.id not in (
    select  od1.id
    from    @t od1 
            inner join @t od2 on 
                od2.id = od1.id
            and od2.val='Y'
    where   od1.val= 'X'
) 
and o.id not in (
    select  od1.id
    from    @t od1 
            inner join @t od2 on 
                od2.id = od1.id
            and od2.val='a'
    where   od1.val= 'b'
)

戈登的第一個查詢和第二個查詢之間的混合。 修復了第一個問題和第二個問題的重復問題：

select id
from @t od
group by id
having (    sum(case when val in ('X') then 1 else 0 end) = 0
         or sum(case when val in ('Y') then 1 else 0 end) = 0
        )
    and(    sum(case when val in ('A') then 1 else 0 end) = 0
         or sum(case when val in ('B') then 1 else 0 end) = 0
        )

使用INTERSECT和EXCEPT：

select  id
from    @t
except
(
    select  id
    from    @t
    where   val = 'a'
    intersect
    select  id
    from    @t
    where   val = 'b'
)
except
(
    select  id
    from    @t
    where   val = 'x'
    intersect
    select  id
    from    @t
    where   val = 'y'
)

SQL-查找一對行是否不存在的最有效方法

問題描述

2 個解決方案

解決方案1
1 2017-04-07 01:28:03

解決方案2
1 已采納 2017-04-08 09:38:57

詢問

索引

替代查詢

SQL-查找一對行是否不存在的最有效方法

問題描述

2 個解決方案

解決方案1 1 2017-04-07 01:28:03

解決方案2 1 已采納 2017-04-08 09:38:57

詢問

索引

替代查詢

解決方案1
1 2017-04-07 01:28:03

解決方案2
1 已采納 2017-04-08 09:38:57