简体   繁体   中英

Minor change to SQL SERVER query causes extremely slow execution time

I dont understand whats functionally different about these 2 queries that would make them so different. First my initial query:

SELECT * FROM XSales_Code SC
    WHERE SC.Status = 1
        AND SC.SCode NOT IN
            (
            SELECT DISTINCT SCode FROM XTransactions_01
            WHERE Last_Mdt > '2012-01-01'
                AND SCode IS NOT NULL
            )
        AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc

This took 13 minutes and 6 seconds to execute. Since I'm used to simple queries like this taking several seconds rather then several minutes I played around with and made this query which is, at least in my eyes, equivalent:

SELECT DISTINCT SCode INTO #TEMP1 FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
    AND SCode IS NOT NULL

SELECT * FROM XSales_Code SC
    WHERE SC.Status = 1
        AND SC.SCode NOT IN
            (
            SELECT Scode FROM #TEMP1
            )
        AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc

DROP TABLE #TEMP1

The difference is this query takes 2 seconds to execute vs the 13 minutes above. Whats going on here?

In both cases you're using a "correlated subquery", which executes for every row in XSales_Code that passes the Status = 1 AND Last_Mdt < '2014-01-01' conditions.

Think of it like this: XSales_Code is filtered by Status = 1 AND Last_Mdt < '2014-01-01' , then SQL Server scans each row of this intermediate result, and for every single row it executes your SELECT DISTINCT SCode FROM XTransactions_01... query to see if the row should be included.

Your second query executes the correlated subquery the same number of times, but it's faster because it's executing against a smaller table.

Generally, the fastest way to do a NOT IN query is to left join to the "not in" subquery and then omit any rows where the left-joined column is null. This gets rid of the correlated subquery.

SELECT * FROM XSales_Code SC
LEFT JOIN (
    SELECT DISTINCT SCode FROM XTransactions_01
    WHERE Last_Mdt > '2012-01-01'
        AND SCode IS NOT NULL
) whatevs ON SC.SCode = whatevs.SCode
WHERE SC.Status = 1
  AND SC.Last_Mdt < '2014-01-01'
  AND whatevs.SCode IS NULL
ORDER BY Last_Mdt desc

This is hard to explain, but try running the query above without the second-to-last line ( AND whatevs.SCode IS NULL ) and you'll see how whatevs.SCODE has a value when the condition is "IN" and is null when the condition is "NOT IN".

Finally, I want to stress that correlated subqueries aren't inherently evil. Generally they work just fine for an IN condition and plenty of other use cases, but for a NOT IN condition they tend to be slow.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM