[英]How inaccurate can the sys.dm_db_partition_stats.row_count be in getting an Azure SQL DB row count for each table?
I have seen a number of general statements on how sys.dm_db_partition_stats.row_count
can produce inaccurate results due to providing objects' statistics instead of actually doing a COUNT()
.我已经看到了一些关于
sys.dm_db_partition_stats.row_count
如何由于提供对象的统计信息而不是实际执行COUNT()
而产生不准确结果的一般性陈述。 However, I have never been able to find any deeper reasons behind those statements or validate the hypothesis on my Azure SQL DB.但是,我从来没有能够找到这些陈述背后的任何更深层次的原因或验证我的 Azure SQL DB 上的假设。
So I would like to learn -所以我想学习——
Any related insight is much appreciated !非常感谢任何相关的见解!
Several things I was able to find out on my own -- mostly by running various queries containing sys.dm_db_partition_stats.row_count
, while knowing actual row counts in each table.我能够自己找出几件事——主要是通过运行包含
sys.dm_db_partition_stats.row_count
的各种查询,同时知道每个表中的实际行数。
Here's a final query I came up with这是我提出的最后一个查询
This gets fast and (in my case) accurate row count for each table, sorted from high count to low.对于每个表,这会变得快速且(在我的情况下)准确的行数,从高到低排序。
SELECT
(SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name,
B.object_id, B.index_id, B.row_count
FROM
sys.dm_db_partition_stats B
LEFT JOIN
sys.objects A
ON A.object_id = B.object_id
WHERE
SCHEMA_NAME(A.schema_id) <> 'sys'
AND (B.index_id = '0' OR B.index_id = '1')
ORDER BY
B.row_count DESC
First line of WHERE
clause is used to exclude system tables, eg sys.plan_persist_wait_stats
and many others. WHERE
子句的第一行用于排除系统表,例如sys.plan_persist_wait_stats
和许多其他表。
Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don't filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name
or two records with the same table_name
in the query output (if you don't use GROUP BY
)第二行处理非唯一的非聚集索引(它们是对象,显然有自己的统计信息)-> 如果不过滤掉它们,使用
GROUP BY A.schema_id, A.Name
A.查询GROUP BY A.schema_id, A.Name
中的名称或两条具有相同table_name
的记录(如果您不使用GROUP BY
)
We're glad that you found the solution and solved it by yourself.我们很高兴您找到了解决方案并自己解决了它。 Your new edition should be an answer.
您的新版本应该是一个答案。 I just help you post it as answer and this can be beneficial to other community members:
我只是帮助您将其发布为答案,这可能对其他社区成员有益:
Several things I was able to find out on my own -- mostly by running various queries containing sys.dm_db_partition_stats.row_count
, while knowing actual row counts in each table.我能够自己找出几件事——主要是通过运行包含
sys.dm_db_partition_stats.row_count
的各种查询,同时知道每个表中的实际行数。
Here's a final query I came up with This gets fast and (in my case) accurate row count for each table, sorted from high count to low.这是我想出的最后一个查询 这会变得快速且(在我的情况下)每个表的行数准确,从高到低排序。
SELECT
(SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name,
B.object_id, B.index_id, B.row_count
FROM
sys.dm_db_partition_stats B
LEFT JOIN
sys.objects A
ON A.object_id = B.object_id
WHERE
SCHEMA_NAME(A.schema_id) <> 'sys'
AND (B.index_id = '0' OR B.index_id = '1')
ORDER BY
B.row_count DESC
First line of WHERE
clause is used to exclude system tables, eg sys.plan_persist_wait_stats and many others. WHERE
子句的第一行用于排除系统表,例如 sys.plan_persist_wait_stats 和许多其他表。
Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don't filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name
or two records with the same table_name
in the query output (if you don't use GROUP BY
)第二行处理非唯一的非聚集索引(它们是对象,显然有自己的统计信息)-> 如果不过滤掉它们,使用
GROUP BY A.schema_id, A.Name
A.查询GROUP BY A.schema_id, A.Name
中的名称或两条具有相同table_name
的记录(如果您不使用GROUP BY
)
Thanks for your sharing again.再次感谢您的分享。
And thanks for @conor's commnet: "If you want to see how far off the numbers can be, I suggest you try doing user transactions, inserting a bunch of rows, then roll back the transaction."并感谢@conor 的 commnet:“如果你想看看数字有多远,我建议你尝试进行用户事务,插入一堆行,然后回滚事务。”
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.