简体   繁体   English

SQL - 使用CTE并有效地选择特定row_number()的行

[英]SQL - Using CTE and effectively select rows at a specific row_number()

In this scenario is the SELECT TOP necessary within the CTE (micro optimization, I know...). 在这种情况下,CTE中需要SELECT TOP(微观优化,我知道......)。

DECLARE @pageSize INT, @currentPage INT, @top INT
SET @pageSize = 10
SET @currentPage = 150
SET @top = @pageSize * @currentPage + @pageSize

WITH t AS
(
    SELECT TOP(***@top***) ID, name
    ROW_NUMBER() OVER (ORDER BY ID) AS _row,
    FROM dbo.User
)
SELECT TOP(@pageSize) *
FROM t
WHERE t._row > (@top-@pageSize)
ORDER BY t.ID

The above returns 10 (@pageSize) rows from a start number (@top-@pageSize) in a specific order with a row number column. 以上内容以特定顺序从行号列(起始号)返回10(@pageSize)行(@ top- @ pageSize)。 Does the CTE statement know that the "SELECT TOP" outside of the CTE and the WHERE-clause, also outside the CTE, is to come, hence the CTE never returns more rows in the specific order than needed? CTE语句是否知道CTE之外的“SELECT TOP”和CTE之外的WHERE子句将会出现,因此CTE永远不会以特定顺序返回比所需更多的行?

Basically just talking about the ROW_NUMBER function, that it does not count a row number for rows not returned (if I were to have millions of rows...), and also if I were to select top 100 in the CTE, would row_number still be calculated for all the million rows within the table selected? 基本上只是谈论ROW_NUMBER函数,它不计算未返回行的行号(如果我有数百万行...),如果我要在CTE中选择前100,那么row_number仍然是计算所选表格中的所有百万行?

I have tried with and without "SELECT TOP(@top)" in the CTE-statement, inside a loop with 10.000 runs, without seeing any difference in time usage. 我在CTE语句中尝试使用和不使用“SELECT TOP(@top)”,在一个运行10.000次的循环中,没有看到时间使用的任何差异。 Though, I have only 38.000 rows in the table at the moment. 虽然,目前我的表中只有38.000行。

Edit: So the result: 编辑:结果如下:

    WITH t AS
(
    **DO A TOP() WITH AN ORDER BY IN THE CTE**
    SELECT TOP(@top) ID, name 
    ROW_NUMBER() OVER (ORDER BY ID) AS _row,
    FROM dbo.User
    ORDER BY ID
)
SELECT TOP(@pageSize) * 
**SELECTING TOP N FROM THE CTE, WHERE ROW-NUMBER IS ... DUE TO THE CTE IS IN ORDER ALREADY**
FROM t
WHERE t._row > (@top-@pageSize)

This could probably be more efficient if I ORDERED them "backwards", selecting the "bottom @pageSize" of the CTE, which would leave out the where-clause... but that would require some test if it actually were faster... 如果我将它们“向后”排序,选择CTE的“底部@pageSize”,这可能会更有效率,这将省略where子句......但如果实际上更快,则需要进行一些测试......

The use of top without an order by is discouraged. 不鼓励使用没有order bytop There is no guarantee that you will get the rows that you want, so you should not include the top . 无法保证您将获得所需的行,因此您不应包含top Or, you should include an order by id , if that is the ordering that you want. 或者,您应该order by id列出order by id ,如果这是您想要的订单。

The user of top doesn't affect the row_number() calculation, because that calculation is going to be done before the top is applied. top的用户不会影响row_number()计算,因为该计算将在应用top之前完成。 You can imagine having another window function there, such as sum() over () to understand that the top cannot generally be applied before the row_number() and finding the circumstances where it is safe would be hard work. 您可以想象在那里有另一个窗口函数,例如sum() over ()以了解top通常不能在row_number()之前应用,并找到安全的环境将是艰苦的工作。

If you have a supporting index on ID you do not have to read and enumerate the whole table. 如果您在ID上有支持索引,则不必读取和枚举整个表。 SQL Server will have to read the table up to and including the page you want. SQL Server必须读取表格,包括您想要的页面。 So if for example you want page 1 (rows 11 to 20) the query will only fetch 20 rows. 因此,如果您想要第1页(第11行到第20行),则查询将只获取20行。 And that is true even if you don't use the top in the CTE. 即使您没有在CTE中使用顶部也是如此。

A table to test on with some data: 用于测试一些数据的表:

create table dbo.[User]
(
  ID int identity primary key,
  Name nvarchar(128)
)

go

insert into dbo.[User](Name)
select top(1000) Name
from sys.all_objects

A query without the redundant top expressions. 没有冗余顶部表达式的查询。

DECLARE @pageSize INT, @currentPage INT, @top INT;
SET @pageSize = 10;
SET @currentPage = 1;
SET @top = @pageSize * @currentPage + @pageSize;
with C as
(
  select U.ID,
         U.Name,
         row_number() over(order by U.ID) as rn
  from dbo.[User] as U
)
select C.ID,
       C.Name
from C
where C.rn > @pageSize * @currentPage and 
      C.rn <= @pageSize * (@currentPage + 1);

This will give you a query plan like this: 这将为您提供如下查询计划:

在此输入图像描述

The number by each operator is the number of rows actually fetched. 每个运算符的数量是实际获取的行数。 The clustered index scan reads 20 rows ordered by ID . 聚簇索引扫描读取按ID排序的20行。 Segment and Sequence Project enumerates the rows. Segment和Sequence Project枚举行。 Top is the operator that makes sure that no more than 20 rows is fetched. Top是运算符,用于确保提取的行数不超过20行。 The filter removes the rows 1 to 10 and let the rows 11 to 20 through. 过滤器删除行1到10并让行11到20通过。

If we instead try to get page 5 ( @currentPage = 5 to get rows 51 to 60) the plan will look like this: 如果我们改为尝试获取第5页( @currentPage = 5以获取第51行到第60行),该计划将如下所示:

在此输入图像描述

Top operator makes sure only 60 rows is read from the clustered index and the filter filters out the first 50 rows to return the last 10 rows. 顶级运算符确保仅从聚簇索引中读取60行,并且筛选器筛选出前50行以返回最后10行。

Using your last query with the extra top expressions will not add anything of value. 使用带有额外顶部表达式的上一个查询将不会添加任何有价值的内容。 Only one extra redundant top operator. 只有一个额外的冗余顶级操作员

在此输入图像描述

The key to understand what is going on in the query plan is to know that execution is done from left to right demanding one row at a time. 理解查询计划中发生的事情的关键是要知道执行是从左到右完成的,每次要求一行。 That is how the top operator can stop the clustered index scan when enough rows are returned. 这就是顶级操作符在返回足够的行时如何停止聚簇索引扫描。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM