简体   繁体   中英

How inaccurate can the sys.dm_db_partition_stats.row_count be in getting an Azure SQL DB row count for each table?

I have seen a number of general statements on how sys.dm_db_partition_stats.row_count can produce inaccurate results due to providing objects' statistics instead of actually doing a COUNT() . However, I have never been able to find any deeper reasons behind those statements or validate the hypothesis on my Azure SQL DB.

So I would like to learn -

  1. How inaccurate this method can actually be?
  2. Why exactly the results might be skewed?
    (eg stats are only recalculated once per day / on specific object operation).

Any related insight is much appreciated !



Several things I was able to find out on my own -- mostly by running various queries containing sys.dm_db_partition_stats.row_count , while knowing actual row counts in each table.

Here's a final query I came up with
This gets fast and (in my case) accurate row count for each table, sorted from high count to low.

SELECT 
    (SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name,  
    B.object_id, B.index_id, B.row_count 
FROM  
    sys.dm_db_partition_stats B 
LEFT JOIN 
    sys.objects A 
    ON A.object_id = B.object_id 
WHERE 
    SCHEMA_NAME(A.schema_id) <> 'sys' 
    AND (B.index_id = '0' OR B.index_id = '1') 
ORDER BY 
    B.row_count DESC 

First line of WHERE clause is used to exclude system tables, eg sys.plan_persist_wait_stats and many others.

Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don't filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name or two records with the same table_name in the query output (if you don't use GROUP BY )

We're glad that you found the solution and solved it by yourself. Your new edition should be an answer. I just help you post it as answer and this can be beneficial to other community members:

Several things I was able to find out on my own -- mostly by running various queries containing sys.dm_db_partition_stats.row_count , while knowing actual row counts in each table.

Here's a final query I came up with This gets fast and (in my case) accurate row count for each table, sorted from high count to low.

SELECT 
    (SCHEMA_NAME(A.schema_id) + '.' + A.Name) as table_name,  
    B.object_id, B.index_id, B.row_count 
FROM  
    sys.dm_db_partition_stats B 
LEFT JOIN 
    sys.objects A 
    ON A.object_id = B.object_id 
WHERE 
    SCHEMA_NAME(A.schema_id) <> 'sys' 
    AND (B.index_id = '0' OR B.index_id = '1') 
ORDER BY 
    B.row_count DESC 

First line of WHERE clause is used to exclude system tables, eg sys.plan_persist_wait_stats and many others.

Second line takes care of non-unique non-clustered indexes (which are objects and apparently have their own stats) -> if you don't filter them out, you get double row count for indexed tables when using GROUP BY A.schema_id, A.Name or two records with the same table_name in the query output (if you don't use GROUP BY )

Thanks for your sharing again.

And thanks for @conor's commnet: "If you want to see how far off the numbers can be, I suggest you try doing user transactions, inserting a bunch of rows, then roll back the transaction."

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM