简体   繁体   English

sys.dm_db_partition_stats.row_count 在获取每个表的 Azure SQL 数据库行数时有多不准确?

[英]How inaccurate can the sys.dm_db_partition_stats.row_count be in getting an Azure SQL DB row count for each table?

I have seen a number of general statements on how sys.dm_db_partition_stats.row_count can produce inaccurate results due to providing objects' statistics instead of actually doing a COUNT() .我已经看到了一些关于sys.dm_db_partition_stats.row_count如何由于提供对象的统计信息而不是实际执行COUNT()而产生不准确结果的一般性陈述。 However, I have never been able to find any deeper reasons behind those statements or validate the hypothesis on my Azure SQL DB.但是,我从来没有能够找到这些陈述背后的任何更深层次的原因或验证我的 Azure SQL DB 上的假设。

So I would like to learn -所以我想学习——

  1. How inaccurate this method can actually be?这种方法实际上有多不准确?
  2. Why exactly the results might be skewed?为什么结果可能会出现偏差?
    (eg stats are only recalculated once per day / on specific object operation). (例如统计数据每天只重新计算一次/在特定的 object 操作上)。

Any related insight is much appreciated !非常感谢任何相关的见解!



Several things I was able to find out on my own -- mostly by running various queries containing sys.dm_db_partition_stats.row_count , while knowing actual row counts in each table.我能够自己找出几件事——主要是通过运行包含sys.dm_db_partition_stats.row_count的各种查询,同时知道每个表中的实际行数。

Here's a final query I came up with这是我提出的最后一个查询
This gets fast and (in my case) accurate row count for each table, sorted from high count to low.对于每个表,这会变得快速且(在我的情况下)准确的行数,从高到低排序。

SELECT 
    (SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name,  
    B.object_id, B.index_id, B.row_count 
FROM  
    sys.dm_db_partition_stats B 
LEFT JOIN 
    sys.objects A 
    ON A.object_id = B.object_id 
WHERE 
    SCHEMA_NAME(A.schema_id) <> 'sys' 
    AND (B.index_id = '0' OR B.index_id = '1') 
ORDER BY 
    B.row_count DESC 

First line of WHERE clause is used to exclude system tables, eg sys.plan_persist_wait_stats and many others. WHERE子句的第一行用于排除系统表,例如sys.plan_persist_wait_stats和许多其他表。

Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don't filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name or two records with the same table_name in the query output (if you don't use GROUP BY )第二行处理非唯一的非聚集索引(它们是对象,显然有自己的统计信息)-> 如果不过滤掉它们,使用GROUP BY A.schema_id, A.Name A.查询GROUP BY A.schema_id, A.Name中的名称或两条具有相同table_name的记录(如果您不使用GROUP BY

We're glad that you found the solution and solved it by yourself.我们很高兴您找到了解决方案并自己解决了它。 Your new edition should be an answer.您的新版本应该是一个答案。 I just help you post it as answer and this can be beneficial to other community members:我只是帮助您将其发布为答案,这可能对其他社区成员有益:

Several things I was able to find out on my own -- mostly by running various queries containing sys.dm_db_partition_stats.row_count , while knowing actual row counts in each table.我能够自己找出几件事——主要是通过运行包含sys.dm_db_partition_stats.row_count的各种查询,同时知道每个表中的实际行数。

Here's a final query I came up with This gets fast and (in my case) accurate row count for each table, sorted from high count to low.这是我想出的最后一个查询 这会变得快速且(在我的情况下)每个表的行数准确,从高到低排序。

SELECT 
    (SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name,  
    B.object_id, B.index_id, B.row_count 
FROM  
    sys.dm_db_partition_stats B 
LEFT JOIN 
    sys.objects A 
    ON A.object_id = B.object_id 
WHERE 
    SCHEMA_NAME(A.schema_id) <> 'sys' 
    AND (B.index_id = '0' OR B.index_id = '1') 
ORDER BY 
    B.row_count DESC 

First line of WHERE clause is used to exclude system tables, eg sys.plan_persist_wait_stats and many others. WHERE子句的第一行用于排除系统表,例如 sys.plan_persist_wait_stats 和许多其他表。

Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don't filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name or two records with the same table_name in the query output (if you don't use GROUP BY )第二行处理非唯一的非聚集索引(它们是对象,显然有自己的统计信息)-> 如果不过滤掉它们,使用GROUP BY A.schema_id, A.Name A.查询GROUP BY A.schema_id, A.Name中的名称或两条具有相同table_name的记录(如果您不使用GROUP BY

Thanks for your sharing again.再次感谢您的分享。

And thanks for @conor's commnet: "If you want to see how far off the numbers can be, I suggest you try doing user transactions, inserting a bunch of rows, then roll back the transaction."并感谢@conor 的 commnet:“如果你想看看数字有多远,我建议你尝试进行用户事务,插入一堆行,然后回滚事务。”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM