对SQL SERVER查询的微小更改导致执行时间极慢

Question

I dont understand whats functionally different about these 2 queries that would make them so different. 我不明白这两个查询在功能上有什么不同，这会使它们如此不同。 First my initial query: 首先我的初始查询：

SELECT * FROM XSales_Code SC
    WHERE SC.Status = 1
        AND SC.SCode NOT IN
            (
            SELECT DISTINCT SCode FROM XTransactions_01
            WHERE Last_Mdt > '2012-01-01'
                AND SCode IS NOT NULL
            )
        AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc

This took 13 minutes and 6 seconds to execute. 执行此过程需要13分6秒。 Since I'm used to simple queries like this taking several seconds rather then several minutes I played around with and made this query which is, at least in my eyes, equivalent: 由于我习惯于像这样简单的查询，所以花了几秒钟而不是几分钟，所以我花了很多时间进行了查询，至少在我看来，这是等效的：

SELECT DISTINCT SCode INTO #TEMP1 FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
    AND SCode IS NOT NULL

SELECT * FROM XSales_Code SC
    WHERE SC.Status = 1
        AND SC.SCode NOT IN
            (
            SELECT Scode FROM #TEMP1
            )
        AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc

DROP TABLE #TEMP1

The difference is this query takes 2 seconds to execute vs the 13 minutes above. 区别在于此查询需要2秒钟才能执行，而上面的13分钟需要执行2秒钟。 Whats going on here? 这里发生了什么？

Answer 1

In both cases you're using a "correlated subquery", which executes for every row in XSales_Code that passes the Status = 1 AND Last_Mdt < '2014-01-01' conditions. 在这两种情况下，您都将使用“相关子查询”，该查询针对XSales_Code中通过Status = 1 AND Last_Mdt < '2014-01-01'条件的每一行执行。

Think of it like this: XSales_Code is filtered by Status = 1 AND Last_Mdt < '2014-01-01' , then SQL Server scans each row of this intermediate result, and for every single row it executes your SELECT DISTINCT SCode FROM XTransactions_01... query to see if the row should be included. 可以这样想： XSales_Code通过Status = 1 AND Last_Mdt < '2014-01-01'过滤，然后SQL Server扫描此中间结果的每一行，并且对于每一行，它都会SELECT DISTINCT SCode FROM XTransactions_01...执行SELECT DISTINCT SCode FROM XTransactions_01...查询以查看是否应包含该行。

Your second query executes the correlated subquery the same number of times, but it's faster because it's executing against a smaller table. 您的第二个查询执行关联子查询的次数相同，但是它更快，因为它是针对较小的表执行的。

Generally, the fastest way to do a NOT IN query is to left join to the "not in" subquery and then omit any rows where the left-joined column is null. 通常，执行NOT IN查询的最快方法是左联接“ not in”子查询，然后省略左联接列为null的任何行。 This gets rid of the correlated subquery. 这摆脱了相关的子查询。

SELECT * FROM XSales_Code SC
LEFT JOIN (
    SELECT DISTINCT SCode FROM XTransactions_01
    WHERE Last_Mdt > '2012-01-01'
        AND SCode IS NOT NULL
) whatevs ON SC.SCode = whatevs.SCode
WHERE SC.Status = 1
  AND SC.Last_Mdt < '2014-01-01'
  AND whatevs.SCode IS NULL
ORDER BY Last_Mdt desc

This is hard to explain, but try running the query above without the second-to-last line ( AND whatevs.SCode IS NULL ) and you'll see how whatevs.SCODE has a value when the condition is "IN" and is null when the condition is "NOT IN". 这很难解释，但是尝试在没有倒数第二行（ AND whatevs.SCode IS NULL ）的情况下运行上面的查询，当条件为“ IN”且为空时，您将看到whatevs.SCODE如何具有一个值。条件为“ NOT IN”时。

Finally, I want to stress that correlated subqueries aren't inherently evil. 最后，我想强调一下，相关的子查询并不是天生的邪恶。 Generally they work just fine for an IN condition and plenty of other use cases, but for a NOT IN condition they tend to be slow. 通常，它们在IN条件和许多其他用例下都可以正常工作，但在NOT IN条件下，它们往往会变慢。

对SQL SERVER查询的微小更改导致执行时间极慢

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-05-30 02:51:50

对SQL SERVER查询的微小更改导致执行时间极慢

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-05-30 02:51:50

解决方案1
1 已采纳 2015-05-30 02:51:50