简体   繁体   English

“不”之间的数学关系是什么? SQL查询的“受影响的行数”和“执行时间”?

[英]What is the mathematical relationship between “no. of rows affected” and “execution time” of a sql query?

The query remains constant ie it will remain the same. 查询保持不变,即它将保持不变。

eg a select query takes 30 minutes if it returns 10000 rows. 例如,如果选择查询返回10000行,则需要30分钟。

Would the same query take 1 hour if it has to return 20000 rows? 如果必须返回20000行,同一查询将花费1个小时吗?

I am interested in knowing the mathematical relation between no. 我有兴趣知道之间的数学关系 of rows(N) and execution time(T) keeping other parameters as constant(K). 行数(N)执行时间(T)保持其他参数不变(K)。

ie T= N*K or 即T = N * K或

T=N*K + C or T = N * K + C或

any other formula? 还有其他公式吗?

Reading http://research.microsoft.com/pubs/76556/progress.pdf if it helps. 如果有帮助,请阅读http://research.microsoft.com/pubs/76556/progress.pdf Anybody who can understand this before me, please do reply. 任何在我之前都可以理解的人,请回复。 Thanks... 谢谢...

Well that is good question :), but there is not exact formula, because it depends of execution plan. 嗯,这是个好问题:),但是没有确切的公式,因为它取决于执行计划。

SQL query optimizer could choose another execution plan on query which return different number of rows. SQL查询优化器可以选择另一个查询执行计划,该计划返回不同的行数。 I guess if the query execution plan is the same for both query's and you have some "lab" conditions then time growth could be linear. 我猜如果两个查询的查询执行计划都相同,并且您有一些“实验室”条件,那么时间增长可能是线性的。 You should research more on sql execution plans and statistics 您应该对sql执行计划和统计信息进行更多研究

Take the very simple example of reading every row in a single table. 以读取单个表中的每一行的非常简单的示例为例。

In the worst case, you will have to read every page of the table from your underlying storage. 在最坏的情况下,您将必须从基础存储中读取表的每一页。 The worst case for this is having to do a random seek. 最坏的情况是必须随机搜索。 The seek time will dominate all other factors. 寻道时间将主导所有其他因素。 So you can estimate the total time. 因此,您可以估计总时间。

time ~= seek time x number of data pages

Assuming your rows are of a fairly regular size, then this is linear in the number of rows. 假设您的行大小相当规则,那么行数是线性的。

However databases do a number of things to try and avoid this worst case. 但是,数据库做了很多事情来尝试避免这种最坏的情况。 For example, in SQL Server table storage is often allocated in extents of 8 consecutive pages. 例如,在SQL Server中,表存储通常以8个连续页面的范围分配。 A hard drive has a much faster streaming IO rate than random IO rate. 硬盘驱动器的流IO速率比随机IO速率快得多。 If you have a clustered index, reading the pages in cluster order tend to have a lot more streaming IO than random IO. 如果您具有聚簇索引,则按聚簇顺序读取页面往往比随机IO具有更多的流IO。

The best case time, ignoring memory caching, is (8KB is the SQL Server page size) 忽略内存缓存的最佳情况是(SQL Server页面大小为8KB)

time ~= 8KB * number of data pages / streaming IO rate in KB/s

This is also linear in the number of rows. 行数也是线性的。

As long as you do a reasonable job managing fragmentation, you could reasonably extrapolate linearly in this simple case. 只要您能合理地管理碎片管理,就可以在这种简单情况下合理地线性推断。 This assumes your data is much larger than the buffer cache. 假设您的数据远大于缓冲区缓存。 If not, you also have to worry about the cliff edge where your query changes from reading from buffer to reading from disk. 如果不是这样,您还必须担心查询从缓冲区读取变为从磁盘读取的悬崖边缘。

I'm also ignoring details like parallel storage paths and access. 我也忽略了诸如并行存储路径和访问之类的细节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM