[英]Minor change to SQL SERVER query causes extremely slow execution time
I dont understand whats functionally different about these 2 queries that would make them so different. 我不明白这两个查询在功能上有什么不同,这会使它们如此不同。 First my initial query:
首先我的初始查询:
SELECT * FROM XSales_Code SC
WHERE SC.Status = 1
AND SC.SCode NOT IN
(
SELECT DISTINCT SCode FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
)
AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc
This took 13 minutes and 6 seconds to execute. 执行此过程需要13分6秒。 Since I'm used to simple queries like this taking several seconds rather then several minutes I played around with and made this query which is, at least in my eyes, equivalent:
由于我习惯于像这样简单的查询,所以花了几秒钟而不是几分钟,所以我花了很多时间进行了查询,至少在我看来,这是等效的:
SELECT DISTINCT SCode INTO #TEMP1 FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
SELECT * FROM XSales_Code SC
WHERE SC.Status = 1
AND SC.SCode NOT IN
(
SELECT Scode FROM #TEMP1
)
AND SC.Last_Mdt < '2014-01-01'
ORDER BY Last_Mdt desc
DROP TABLE #TEMP1
The difference is this query takes 2 seconds to execute vs the 13 minutes above. 区别在于此查询需要2秒钟才能执行,而上面的13分钟需要执行2秒钟。 Whats going on here?
这里发生了什么?
In both cases you're using a "correlated subquery", which executes for every row in XSales_Code
that passes the Status = 1 AND Last_Mdt < '2014-01-01'
conditions. 在这两种情况下,您都将使用“相关子查询”,该查询针对
XSales_Code
中通过Status = 1 AND Last_Mdt < '2014-01-01'
条件的每一行执行。
Think of it like this: XSales_Code
is filtered by Status = 1 AND Last_Mdt < '2014-01-01'
, then SQL Server scans each row of this intermediate result, and for every single row it executes your SELECT DISTINCT SCode FROM XTransactions_01...
query to see if the row should be included. 可以这样想:
XSales_Code
通过Status = 1 AND Last_Mdt < '2014-01-01'
过滤,然后SQL Server扫描此中间结果的每一行,并且对于每一行,它都会SELECT DISTINCT SCode FROM XTransactions_01...
执行SELECT DISTINCT SCode FROM XTransactions_01...
查询以查看是否应包含该行。
Your second query executes the correlated subquery the same number of times, but it's faster because it's executing against a smaller table. 您的第二个查询执行关联子查询的次数相同,但是它更快,因为它是针对较小的表执行的。
Generally, the fastest way to do a NOT IN
query is to left join to the "not in" subquery and then omit any rows where the left-joined column is null. 通常,执行
NOT IN
查询的最快方法是左联接“ not in”子查询,然后省略左联接列为null的任何行。 This gets rid of the correlated subquery. 这摆脱了相关的子查询。
SELECT * FROM XSales_Code SC
LEFT JOIN (
SELECT DISTINCT SCode FROM XTransactions_01
WHERE Last_Mdt > '2012-01-01'
AND SCode IS NOT NULL
) whatevs ON SC.SCode = whatevs.SCode
WHERE SC.Status = 1
AND SC.Last_Mdt < '2014-01-01'
AND whatevs.SCode IS NULL
ORDER BY Last_Mdt desc
This is hard to explain, but try running the query above without the second-to-last line ( AND whatevs.SCode IS NULL
) and you'll see how whatevs.SCODE
has a value when the condition is "IN" and is null when the condition is "NOT IN". 这很难解释,但是尝试在没有倒数第二行(
AND whatevs.SCode IS NULL
)的情况下运行上面的查询,当条件为“ IN”且为空时,您将看到whatevs.SCODE
如何具有一个值。条件为“ NOT IN”时。
Finally, I want to stress that correlated subqueries aren't inherently evil. 最后,我想强调一下,相关的子查询并不是天生的邪恶。 Generally they work just fine for an
IN
condition and plenty of other use cases, but for a NOT IN
condition they tend to be slow. 通常,它们在
IN
条件和许多其他用例下都可以正常工作,但在NOT IN
条件下,它们往往会变慢。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.