[英]Query performance: CTE using ROW_NUMBER() to select first row
We have three environments and when I run my SQL query in two of them just takes 30 or 38 seconds to run but in the other environment running is not completed and I should cancel it.我们有三个环境,当我在其中两个环境中运行我的 SQL 查询时只需要 30 或 38 秒即可运行,但在另一个环境中运行未完成,我应该取消它。 Query is based on two parts, a CTE and a very simple select from a table, in both CTE and select I'm using the same table.查询基于两部分,一个 CTE 和一个非常简单的 select 来自一个表,在 CTE 和 select 中我都使用同一个表。
Could you please tell me why it takes so long time?你能告诉我为什么需要这么长时间吗? how can I improve the query?如何改进查询?
ALTER VIEW [fact].[vPurchase]
AS
WITH VKPL AS
(
SELECT *
FROM
(SELECT
iv.[Delivery_FK],
1 AS column2,
ROW_NUMBER() OVER(PARTITION BY [Delivery_FK] ORDER BY iv.UpdateDate) AS rk
FROM
[fact].[KRMFact] iv
LEFT JOIN
[dimension].[Product] pr ON iv.Product_FK =pr.Product_SK
LEFT JOIN
[dimension].[Delivery] le ON le.Delivery_FK = iv.Delivery_FK
WHERE
pr.Product_Key = '740') X
WHERE
rk = 1
)
SELECT
-- .... here are some columns
Delivery_FK,
Product_FK,
CAST(column2 AS VARCHAR) AS column2,
f.[UpdateDate] AS [Update date]
FROM
[fact].[KRMFact] f
LEFT JOIN
VKPL v ON f.Delivery_FK = v.Delivery_FK
This is guesswork.这是猜测。
I guess the environment where this query is slow is the one with lots of production data in it.我猜这个查询速度慢的环境是其中包含大量生产数据的环境。
I guess some index on your KRMFact
table will, maybe, help you.我猜你的KRMFact
表上的一些索引可能会对你有所帮助。 Here's how to figure out what you need: SQL Server Management Studio (SSMS) has a feature to show you a query's execution plan.以下是确定您需要什么的方法: SQL Server Management Studio (SSMS) 具有向您显示查询执行计划的功能。 Put your query ( not simplified, please, the actual query ) into SSMS, right click and choose "Include Actual Execution Plan."将您的查询(请不要简化,请实际查询)放入 SSMS,右键单击并选择“包括实际执行计划”。 Then run the query.然后运行查询。 The execution plan display may recommend an index for you to create to get this query to run faster.执行计划显示可能会建议您创建一个索引,以使该查询运行得更快。
I guess you're trying to find rows with the earliest values of UpdateDate
.我猜您正在尝试查找具有最早值的行UpdateDate
。
Your subquery你的子查询
SELECT *
FROM
(SELECT
iv.[Delivery_FK],
1 AS column2,
ROW_NUMBER() OVER(PARTITION BY [Delivery_FK] ORDER BY iv.UpdateDate) AS rk
FROM
[fact].[KRMFact] iv
LEFT JOIN
[dimension].[Product] pr ON iv.Product_FK =pr.Product_SK
LEFT JOIN
[dimension].[Delivery] le ON le.Delivery_FK = iv.Delivery_FK
WHERE
pr.Product_Key = '740') X
WHERE
rk = 1
looks like it picks out the row with the earliest KRMFact.UpdateDate
for each value of KRMFact.Delivery_FK
.看起来它为KRMFact.UpdateDate
的每个值挑选出最早的KRMFact.Delivery_FK
行。 That's what the ROW_NUMBER() OVER... WHERE rk=1
language does.这就是ROW_NUMBER() OVER... WHERE rk=1
语言所做的。
If my guess about that is correct you can do that a different way, which may be more efficient.如果我对此的猜测是正确的,您可以采取不同的方式,这可能会更有效。
SELECT *
FROM
(SELECT
iv.[Delivery_FK],
1 AS column2,
1 AS rk
FROM
[fact].[KRMFact] iv
JOIN ( SELECT Delivery_FK, MIN(UpdateDate) first_update
FROM [fact].[KRMFact]
GROUP BY Delivery_FK
) first_update ON iv.UpdateDate = first_update.first_update
LEFT JOIN
[dimension].[Product] pr ON iv.Product_FK =pr.Product_SK
LEFT JOIN
[dimension].[Delivery] le ON le.Delivery_FK = iv.Delivery_FK
WHERE
pr.Product_Key = '740') X
WHERE
rk = 1
You should probably try out the old and new versions of the subquery to determine whether they will yield the same results.您可能应该尝试新旧版本的子查询,以确定它们是否会产生相同的结果。
If you use this subquery query I suggest, this index will help make it run faster by optimizing the new sub-sub-query's MIN()... GROUP BY
operation.如果您使用我建议的这个子查询查询,这个索引将通过优化新的子子查询的MIN()... GROUP BY
操作来帮助它运行得更快。
CREATE INDEX x_KRMFact_Product_Update
ON [fact].[KRMFact]
([Product_FK],[UpdateDate])
By the way, WHERE pr.Product_Key = '740'
turns your LEFT JOIN [dimension].[Product]
operation into an ordinary inner JOIN.顺便说一句, WHERE pr.Product_Key = '740'
将您的LEFT JOIN [dimension].[Product]
操作转换为普通的内部 JOIN。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.