简体   繁体   English

查询的速度是否取决于表中的行数?

[英]Does the speed of the query depend on the number of rows in the table?

Let's say I have this query: 假设我有这个问题:

select * from table1 r where r.x = 5

Does the speed of this query depend on the number of rows that are present in table1 ? 此查询的速度是否取决于table1中存在的行数?

The are many factors on the speed of a query, one of which can be the number of rows. 查询速度有很多因素,其中一个可以是行数。

Others include: 其他包括:

  • index strategy (if you index column "x", you will see better performance than if it's not indexed) 索引策略(如果索引列“x”,您将看到比未编制索引更好的性能)
  • server load 服务器负载
  • data caching - once you've executed a query, the data will be added to the data cache. 数据缓存 - 一旦执行了查询,数据就会被添加到数据缓存中。 So subsequent reruns will be much quicker as the data is coming from memory, not disk. 因此,后续重新运行将更快,因为数据来自内存,而不是磁盘。 Until such point where the data is removed from the cache 直到从缓存中删除数据的那一点
  • execution plan caching - to a lesser extent. 执行计划缓存 - 在较小程度上。 Once a query is executed for the first time, the execution plan SQL Server comes up with will be cached for a period of time, for future executions to reuse. 一旦第一次执行查询,SQL Server提出的执行计划将被缓存一段时间,以便将来执行重用。
  • server hardware 服务器硬件
  • the way you've written the query (often one of the biggest contibutors to poor performance!). 你编写查询的方式(通常是表现不佳的最大连词之一!)。 eg writing something using a cursor instead of a set-based operation 例如,使用光标而不是基于集合的操作来编写内容

For databases with a large number of rows in tables, partitioning is usually something to consider (with SQL Server 2005 onwards, Enterprise Edition there is built-in support). 对于表中包含大量行的数据库,通常需要考虑分区(从SQL Server 2005开始,Enterprise Edition有内置支持)。 This is to split the data down into smaller units. 这是将数据拆分为更小的单位。 Generally, smaller units = smaller tables = smaller indexes = better performance. 通常,较小的单位=较小的表=较小的索引=较好的性能。

Yes, and it can be very significant. 是的,它可能非常重要。

If there's 100 million rows, SQL server has to go through each of them and see if it matches. 如果有1亿行,SQL服务器必须遍历每个行并查看它是否匹配。 That takes a lot more time compared to there being 10 rows. 与10行相比,这需要更多的时间。

You probably want an index on the 'x' column, in which case the sql server might check the index rather than going through all the rows - which can be significantly faster as the sql server might not even need to check all the values in the index. 你可能想要'x'列上的索引,在这种情况下,sql server可能会检查索引而不是遍历所有行 - 这可能会明显更快,因为sql server可能甚至不需要检查所有的值指数。

On the other hand, if there's 100 million rows matching x = 5, it's slower than 10 rows. 另一方面,如果有1亿行匹配x = 5,则它比10行慢。

Almost always yes. 几乎总是肯定的。 The real question is: what is the rate at which the query slows down as the table size increases? 真正的问题是:随着表格大小的增加,查询速度会降低多少? And the answer is: by not much if rx is indexed, and by a large amount if not. 答案是:如果rx被索引,则不是很多,如果不是,则大量的。

Not the rows (to a certain degree of course) per se, but the amount of data (columns) is what can make a query slow. 不是行(在某种程度上当然)本身,但数据量(列)是可以使查询变慢的原因。 The data also needs to be transfered from the backend to the frontend. 数据也需要从后端传输到前端。

The Answer is Yes. 答案是肯定的。 But not the only factor. 但不是唯一的因素。 if you did appropriate optimizations and tuning the performance drop will be negligible Main Performance factors 如果您进行了适当的优化并且调整性能下降将是可忽略不计的主要性能因素

  • Indexing Clustered or None clustered 索引聚簇或无聚簇
  • Data Caching 数据缓存
  • Table Partitioning 表分区
  • Execution Plan caching 执行计划缓存
  • Data Distribution 数据分布
  • Hardware specs 硬件规格

There are some other factors but these are mainly considered. 还有一些其他因素,但主要考虑这些因素。 Even how you designed your Schema makes effect on the performance. 甚至你如何设计Schema也会影响性能。

You should assume that your query always depends on the number of rows. 您应该假设您的查询总是取决于行数。 In fact, you should assume the worst case (linear or O(N) for the example you provided) and exponential for more complex queries. 实际上,您应该假设最坏的情况(您提供的示例的线性或O(N) )和更复杂查询的指数。 There are database specific manuals filled with tricks to help you avoid the worst case but SQL itself is a language and doesn't specify how to execute your query. 有一些数据库特定的手册充满了技巧,以帮助您避免最坏的情况,但SQL本身是一种语言,并没有指定如何执行您的查询。 Instead, the database implementation decides how to execute any given query: if you have indexed a column or set of columns in your database then you will get O(log(N)) performance for a simple lookup; 相反,数据库实现决定如何执行任何给定的查询:如果您已在数据库中索引了一列或一组列,那么您将获得简单查找的O(log(N))性能; if the system has effective query caching you might get O(1) response. 如果系统具有有效的查询缓存,则可能会得到O(1)响应。 Here is a good introductory article: High scalability: SQL and computational complexity 这是一篇很好的介绍性文章: 高可伸缩性:SQL和计算复杂性

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM