简体   繁体   English

SQL 服务器在哪里获取 AUTO_CREATE_STATISTICS 关闭时的估计行数

[英]Where does SQL Server get estimated number of rows for when AUTO_CREATE_STATISTICS is off

Where does SQL Server get the Estimated Number of Rows when you have AUTO_CREATE_STATISTICS turned off?当您关闭AUTO_CREATE_STATISTICS时,SQL 服务器从哪里获得估计的行数?

Here is an example:这是一个例子:

Setup experiment:设置实验:


USE master;
GO

IF EXISTS(SELECT * FROM sys.databases WHERE name = 'TestDatabase')
BEGIN
    ALTER DATABASE TestDatabase SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE TestDatabase;
END

GO

CREATE DATABASE TestDatabase;

GO

ALTER DATABASE TestDatabase SET AUTO_CREATE_STATISTICS OFF;

GO

USE TestDatabase;

GO

DROP TABLE IF EXISTS TestTable;

GO

CREATE TABLE TestTable
(
    Id INT NOT NULL IDENTITY PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50)
);

Please view the edit for the updated code请查看更新代码的编辑

Experiment:实验:

Insert 200 rows:插入 200 行:

SET NOCOUNT ON;
INSERT INTO TestTable Values('Test', 'Blah')
GO 200

Click Display Estimated Execution Plan when highlighting the query below:突出显示以下查询时单击Display Estimated Execution Plan

SELECT * 
FROM TestTable 
WHERE LastName = 'blah';

It gives me an estimated # row of 200.它给了我估计的 # 行 200。

Run the below query again:再次运行以下查询:

SET NOCOUNT ON;
INSERT INTO TestTable Values('Test', 'Blah')
GO 200

Once again click Display Estimated Execution Plan when highlighting the query below:在突出显示下面的查询时再次单击Display Estimated Execution Plan

SELECT * 
FROM TestTable 
WHERE LastName = 'blah';

It gives me an estimated # rows of 400.它给了我估计的 # 行 400。

Now I run query instead of getting estimates现在我运行查询而不是获取估计值

SELECT * 
FROM TestTable 
WHERE LastName = 'blah';

Now I insert another 200 rows现在我再插入 200 行

SET NOCOUNT ON;
INSERT INTO TestTable Values('Test', 'Blah')
GO 200

Once again click Display Estimated Execution Plan when highlighting the query below:在突出显示下面的查询时再次单击Display Estimated Execution Plan

SELECT * 
FROM TestTable 
WHERE LastName = 'blah';

It once again gives me an estimated # row of 400 instead of 600 rows.它再次给了我估计的 # 行 400 而不是 600 行。

So I run所以我跑

SET NOCOUNT ON;
INSERT INTO TestTable Values('Test', 'Blah')
GO 10000

Once again click Display Estimated Execution Plan when highlighting the query below:在突出显示下面的查询时再次单击Display Estimated Execution Plan

SELECT * 
FROM TestTable 
WHERE LastName = 'blah';

Gives man an estimated plan of 400 rows instead of 10,600.为 man 提供 400 行而不是 10,600 行的估计计划。

So it appears if you get the estimated number of rows before running the query, it will give you the total number of rows in the table.因此,如果您在运行查询之前获得估计的行数,它就会为您提供表中的总行数。 Once you run the query, it gives you the total number of rows on the table before running the query.运行查询后,它会在运行查询之前为您提供表中的总行数。

So where exactly is SQL Server getting this number from?那么 SQL 服务器究竟是从哪里得到这个数字的呢?

---EDIT 1: I got my results wrong, sorry about the confusion--- ---编辑 1:我的结果错了,很抱歉造成混淆---

When I got the above estimates yesterday, I changed my experiment to AUTO_CREATE_STATISTICS ON to test that behavior and didn't realize it was still on.当我昨天得到上述估计时,我将我的实验更改为AUTO_CREATE_STATISTICS ON以测试该行为并且没有意识到它仍然存在。 Sorry about the confusion.抱歉造成混乱。

So let me show my screenshots with AUTO_CREATE_STATISTICS OFF :所以让我展示一下AUTO_CREATE_STATISTICS OFF的截图:

  • I start with a new database and a new table with AUTO_CREATE_STATISTICS OFF .我从一个新数据库和一个AUTO_CREATE_STATISTICS OFF的新表开始。
  • I insert 200 rows我插入 200 行
  • Click Display Estimated Execution Plan when highlighting the query below:突出显示以下查询时单击Display Estimated Execution Plan
SELECT * 
FROM TestTable 
WHERE LastName = 'blah';

在此处输入图像描述

  • Estimates are估计是
    • Estimated Number of Rows to be Read is 200 rows Estimated Number of Rows to be Read为 200 行
    • Estimated Number of Rows for All Executions is 14.142 (which is 200.000^.50). Estimated Number of Rows for All Executions为 14.142(即 200.000^.50)。
  • I insert another 200 rows我再插入 200 行
  • Click Display Estimated Execution Plan again for the above query上面的查询再次点击Display Estimated Execution Plan 在此处输入图像描述
  • Estimates are估计是
    • Estimated Number of Rows to be Read is 400 rows Estimated Number of Rows to be Read为 400 行
    • Estimated Number of Rows for All Executions is 20 (which is 400.000^.50). Estimated Number of Rows for All Executions为 20(即 400.000^.50)。
  • Now I run the above query and get 400 rows as expected.现在我运行上面的查询并按预期获得 400 行。
  • I insert 200 rows我插入 200 行
  • Click Display Estimated Execution Plan again for the above query上面的查询再次点击Display Estimated Execution Plan 在此处输入图像描述
  • Estimates are估计是
    • Estimated Number of Rows to be Read is 400 rows Estimated Number of Rows to be Read为 400 行
    • Estimated Number of Rows for All Executions is 20 (which is 400.000^.50). Estimated Number of Rows for All Executions为 20(即 400.000^.50)。
  • I insert 10000 rows我插入 10000 行
  • Estimates are估计是
    • Estimated Number of Rows to be Read is 400 rows Estimated Number of Rows to be Read为 400 行
    • Estimated Number of Rows for All Executions is 20 (which is 400.000^.50). Estimated Number of Rows for All Executions为 20(即 400.000^.50)。
  • Do a statistics update:做一个统计更新:
UPDATE STATISTICS TestTable;
  • Estimates are the same even after the stat update即使在统计数据更新后,估计也是一样的
    • Estimated Number of Rows to be Read is 400 rows Estimated Number of Rows to be Read为 400 行
    • Estimated Number of Rows for All Executions is 20 (which is 400.000^.50). Estimated Number of Rows for All Executions为 20(即 400.000^.50)。

So it appears once you run a query, the rows that are used to calculate the estimates are locked-in even after doing a stat update.因此,一旦您运行查询,即使在进行统计更新后,用于计算估计值的行也会被锁定。 This seems really bad.这看起来真的很糟糕。

  • The question is, why does it use the locked-in rows after you run a query instead of using the current number of rows on the table?问题是,为什么它在您运行查询后使用锁定的行而不是使用表中的当前行数?
  • Is this row count stored in a DMV to view?此行数是否存储在 DMV 中以供查看?

---End of EDIT 1--- ---编辑1结束---

When AUTO CREATE STATISTICS is disabled, SQL Server employs a "cardinality estimator" to estimate the number of rows that will be returned by a query.当 AUTO CREATE STATISTICS 被禁用时,SQL 服务器使用“基数估计器”来估计查询将返回的行数。

The cardinality estimator makes this estimate based on a number of factors, including the data types and distributions of the columns in the table, the specific predicates used in the query, and any available statistics.基数估计器根据许多因素进行此估计,包括表中列的数据类型和分布、查询中使用的特定谓词以及任何可用的统计信息。

In your example, it appears that the estimator is estimating the total number of rows in the table rather than the specific predicate in the query ("LastName = 'blah'").在您的示例中,估计器似乎正在估计表中的总行数,而不是查询中的特定谓词(“LastName = 'blah'”)。

This is most likely due to the table's lack of statistics, which would normally be used to more accurately estimate the number of rows.这很可能是由于表缺少统计信息,而统计信息通常用于更准确地估计行数。

I strongly suggest never setting AUTO_CREATE_STATISTICS to OFF so hopefully this question is just for academic interest.我强烈建议永远不要将AUTO_CREATE_STATISTICS设置为OFF所以希望这个问题只是出于学术兴趣。

SQL Server knows the table cardinality (number of rows in the table) as this is held in table metadata independent of statistics. SQL 服务器知道表基数(表中的行数),因为它保存在独立于统计信息的表元数据中。 When you say " estimated # rows of 200" etc are you referring to "estimated number of rows to be read"?当您说“估计有 200 行”等时,您指的是“估计要读取的行数”吗?

As there is no index on LastName and it has to scan the whole table to find rows matching the WHERE clause this will be the same number as table cardinality.由于LastName上没有索引,它必须扫描整个表以找到与WHERE子句匹配的行,这将与表基数相同。 No cardinality estimator model will assume that 100% of those rows will match LastName = 'blah' though.没有基数估计器 model 会假设这些行中的 100% 将匹配LastName = 'blah' For 200 rows in the table I got an estimate of 53 with QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_110 and 14 ( SQRT(200) ) with all later ones.对于表中的 200 行,我使用 QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_110 和 14 ( SQRT(200) ) 获得了 53 的估计值以及所有后来的。

Either way this is just based on a guessed proportion that will match though as (in the absence of any column statistics or constraints) there is nothing else that can be used to base this number on.无论哪种方式,这都只是基于将匹配的猜测比例,因为(在没有任何列统计信息或约束的情况下)没有其他任何东西可以用来作为这个数字的基础。

在此处输入图像描述

When you generate an estimated execution plan this is not stored in the plan cache.当您生成估计的执行计划时,它不会存储在计划缓存中。 When you actually run it the plan is cached.当您实际运行它时,计划会被缓存。 So this is why you see estimates of "rows read" in line with actual table cardinality when all you have done so far is generate estimated plans.所以这就是为什么当您到目前为止所做的只是生成估计计划时,您会看到“读取的行数”估计与实际表基数一致。

It gives me an estimated # rows of 400.它给了我估计的 # 行 400。

Now I run query instead of getting estimates现在我运行查询而不是获取估计值

You executed the query when the table had 400 rows and this added the execution plan to the cache (complete with 400 row estimate) - so future executions can use this cached plan.您在表有 400 行时执行了查询,这将执行计划添加到缓存中(完成 400 行估计)——因此未来的执行可以使用这个缓存的计划。

Once the plan is in the cache you are then dependent on if number of modified rows triggers an optimality based recompile and you get a new execution plan or if it is under the threshold to just use the cached plan.一旦计划在缓存中,您将依赖于修改的行数是否触发基于最优性的重新编译并且您获得新的执行计划,或者它是否低于仅使用缓存计划的阈值。

Usually adding 10,200 rows to a 400 row table would be way in excess of what is required to cross the "Recompilation Threshold" and trigger an optimality based compile.通常向一个 400 行的表中添加 10,200 行将远远超过跨越“重新编译阈值”并触发基于优化的编译所需的内容。 I assume it does not happen in this case as no statistics were ever used in the plan so it can not ever deem these statistics as being stale.我认为在这种情况下不会发生这种情况,因为计划中从未使用过任何统计数据,因此它永远不会将这些统计数据视为过时的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM