简体   繁体   English

对SQL SERVER查询的微小更改导致执行时间极慢

[英]Minor change to SQL SERVER query causes extremely slow execution time

I dont understand whats functionally different about these 2 queries that would make them so different. 我不明白这两个查询在功能上有什么不同,这会使它们如此不同。 First my initial query: 首先我的初始查询:

SELECT * FROM XSales_Code SC
    WHERE SC.Status = 1
        AND SC.SCode NOT IN
            (
            SELECT DISTINCT SCode FROM XTransactions_01
            WHERE Last_Mdt > '2012-01-01'
                AND SCode IS NOT NULL
            )
        AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc

This took 13 minutes and 6 seconds to execute. 执行此过程需要13分6秒。 Since I'm used to simple queries like this taking several seconds rather then several minutes I played around with and made this query which is, at least in my eyes, equivalent: 由于我习惯于像这样简单的查询,所以花了几秒钟而不是几分钟,所以我花了很多时间进行了查询,至少在我看来,这是等效的:

SELECT DISTINCT SCode INTO #TEMP1 FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
    AND SCode IS NOT NULL

SELECT * FROM XSales_Code SC
    WHERE SC.Status = 1
        AND SC.SCode NOT IN
            (
            SELECT Scode FROM #TEMP1
            )
        AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc

DROP TABLE #TEMP1

The difference is this query takes 2 seconds to execute vs the 13 minutes above. 区别在于此查询需要2秒钟才能执行,而上面的13分钟需要执行2秒钟。 Whats going on here? 这里发生了什么?

In both cases you're using a "correlated subquery", which executes for every row in XSales_Code that passes the Status = 1 AND Last_Mdt < '2014-01-01' conditions. 在这两种情况下,您都将使用“相关子查询”,该查询针对XSales_Code中通过Status = 1 AND Last_Mdt < '2014-01-01'条件的每一行执行。

Think of it like this: XSales_Code is filtered by Status = 1 AND Last_Mdt < '2014-01-01' , then SQL Server scans each row of this intermediate result, and for every single row it executes your SELECT DISTINCT SCode FROM XTransactions_01... query to see if the row should be included. 可以这样想: XSales_Code通过Status = 1 AND Last_Mdt < '2014-01-01'过滤,然后SQL Server扫描此中间结果的每一行,并且对于每一行,它都会SELECT DISTINCT SCode FROM XTransactions_01...执行SELECT DISTINCT SCode FROM XTransactions_01...查询以查看是否应包含该行。

Your second query executes the correlated subquery the same number of times, but it's faster because it's executing against a smaller table. 您的第二个查询执行关联子查询的次数相同,但是它更快,因为它是针对较小的表执行的。

Generally, the fastest way to do a NOT IN query is to left join to the "not in" subquery and then omit any rows where the left-joined column is null. 通常,执行NOT IN查询的最快方法是左联接“ not in”子查询,然后省略左联接列为null的任何行。 This gets rid of the correlated subquery. 这摆脱了相关的子查询。

SELECT * FROM XSales_Code SC
LEFT JOIN (
    SELECT DISTINCT SCode FROM XTransactions_01
    WHERE Last_Mdt > '2012-01-01'
        AND SCode IS NOT NULL
) whatevs ON SC.SCode = whatevs.SCode
WHERE SC.Status = 1
  AND SC.Last_Mdt < '2014-01-01'
  AND whatevs.SCode IS NULL
ORDER BY Last_Mdt desc

This is hard to explain, but try running the query above without the second-to-last line ( AND whatevs.SCode IS NULL ) and you'll see how whatevs.SCODE has a value when the condition is "IN" and is null when the condition is "NOT IN". 这很难解释,但是尝试在没有倒数第二行( AND whatevs.SCode IS NULL )的情况下运行上面的查询,当条件为“ IN”且为空时,您将看到whatevs.SCODE如何具有一个值。条件为“ NOT IN”时。

Finally, I want to stress that correlated subqueries aren't inherently evil. 最后,我想强调一下,相关的子查询并不是天生的邪恶。 Generally they work just fine for an IN condition and plenty of other use cases, but for a NOT IN condition they tend to be slow. 通常,它们在IN条件和许多其他用例下都可以正常工作,但在NOT IN条件下,它们往往会变慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM