简体   繁体   中英

How do you calculate data completeness for multiple tables based on null values within columns?

The query below calculates what we need but for only one specific column. How can we do this for all the columns within that table, without having to duplicate the case statement multiple times. This needs to be done for hundreds of tables, so duplicating the case statement is not ideal.

 Select SUM(cast(case when column is null then 0 else 1 end as float))/count(*) from [Table]

So the output would be something like

Column Name: Data completeness

Customer Name: 88%

First, you can simplify the logic to:

Select AVG(case when column is null then 0.0 else 1.0 end)
from [Table]

Then, you can generate the code. The following generates the from expressions. You can copy them over into the query:

select replace('      avg(case when [@col] is null then 0.0 else 1.0 end) as [@col],',
               '@col', column_name)
from information_schema.columns
where table_name = @TableName and table_schema = @SchemaName

Note: quotename() is more correct, but the above should work for reasonable column names (I never have column names that need to be quoted).

Solution by Jens Suessmeyer from Finding the percentage of NULL values for each column in a table

SET NOCOUNT ON
DECLARE @Statement NVARCHAR(MAX) = ''
DECLARE @Statement2 NVARCHAR(MAX) = ''
DECLARE @FinalStatement NVARCHAR(MAX) = ''

DECLARE @TABLE_SCHEMA SYSNAME = <SCHEMA_NAME>
DECLARE @TABLE_NAME SYSNAME = <TABLE_NAME>

SELECT
        @Statement = @Statement + 'SUM(CASE WHEN ' + COLUMN_NAME + ' IS NULL THEN 1 ELSE 0 END) AS ' + COLUMN_NAME + ',' + CHAR(13) ,
        @Statement2 = @Statement2 + COLUMN_NAME + '*100 / OverallCount AS ' + COLUMN_NAME + ',' + CHAR(13)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = @TABLE_NAME 
    AND TABLE_SCHEMA = @TABLE_SCHEMA

IF @@ROWCOUNT = 0
    RAISERROR('TABLE OR VIEW with schema "%s" and name "%s" does not exists or you do not have appropriate permissions.',16,1, @TABLE_SCHEMA, @TABLE_NAME)
ELSE
BEGIN
    SELECT @FinalStatement =
            'SELECT ' + LEFT(@Statement2, LEN(@Statement2) -2) + ' FROM (SELECT ' + LEFT(@Statement, LEN(@Statement) -2) +
            ', COUNT(*) AS OverallCount FROM ' + @TABLE_SCHEMA + '.' + @TABLE_NAME + ') SubQuery'
    EXEC(@FinalStatement)
END

Something like this should work. Basically build a statement that selects the count for each column out of each table, using sys.tables and sys.columns, and then execute that statement.

Begin
    Select @sqlcmd = 'insert into mystats (TableName, ColumnName, TotCount)
             Values (select ''' + t.name + ''', ''' + c.name + ''', count(' + c.name + ') from ' + t.name + ')'
    From sys.tables t inner join sys.columns c
    On c.object_id = t.object_id

    EXEC @sqlcmd
END

You can use UNPIVOT query to do this for you, something like..... In the following query I have assumed that you have 3 columns Column1,Column2,Column3 , the query can be extended to accommodate as many as columns.

Query

SELECT ColumnName 
      , SUM(cast(case when Vals = '' then 0.0 else 1.0 end as DECIMAL(10,2))) * 100
      / COUNT(*)  AS [Percetage]
FROM (
SELECT CAST(ISNULL(Column1, '') AS VARCHAR(100)) AS Column1
      ,CAST(ISNULL(Column2, '') AS VARCHAR(100)) AS Column2
      ,CAST(ISNULL(Column3, '') AS VARCHAR(100)) AS Column3
FROM TableName
  )c
  UNPIVOT (Vals FOR ColumnName IN (Column1,Column2,Column3))up
GROUP BY ColumnName

Result Set

╔════════════╦════════════╗
║ ColumnName ║ Percetage  ║
╠════════════╬════════════╣
║ Column1    ║ 100.000000 ║
║ Column2    ║ 100.000000 ║
║ Column3    ║ 34.065934  ║
╚════════════╩════════════╝

Important Note

Make sure you convert all the columns being used in UNPIVOT IN clause to a uniform data type.

Also using ISNULL(Column1, '') is important as UNPIVOT eliminates any null values.

My answer combines the sample from lad2025's answer and the UNPIVOT from M.Ali's answer to provide you with a result set with a row for each column containing the name of the column and the percentage of nulls. It will show them in descending order by percentage of nulls.

SET NOCOUNT ON
DECLARE @Statement NVARCHAR(MAX) = ''
DECLARE @Statement2 NVARCHAR(MAX) = ''
DECLARE @Statement3 NVARCHAR(MAX) = ''
DECLARE @FinalStatement NVARCHAR(MAX) = ''

DECLARE @TABLE_SCHEMA SYSNAME = <SCHEMA_Name>
DECLARE @TABLE_NAME SYSNAME = <TABLE_Name>

SELECT
        @Statement = @Statement + 'SUM(CASE WHEN ' + COLUMN_NAME + 
            ' IS NULL THEN 1 ELSE 0 END) AS ' + COLUMN_NAME + ',' + CHAR(13) ,
        @Statement2 = @Statement2 + COLUMN_NAME + 
            '*100 / OverallCount AS ' + COLUMN_NAME + ',' + CHAR(13),
        @Statement3 = @Statement3 + COLUMN_NAME + ','
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = @TABLE_NAME 
    AND TABLE_SCHEMA = @TABLE_SCHEMA

IF @@ROWCOUNT = 0
    RAISERROR('TABLE OR VIEW with schema "%s" and name "%s" does not exists or you do not have appropriate permissions.',16,1, @TABLE_SCHEMA, @TABLE_NAME)
ELSE
BEGIN
    SELECT @FinalStatement =
            'SELECT u.ColumnName, u.NullPercentage FROM (SELECT ' + 
            LEFT(@Statement2, LEN(@Statement2) -2) + 
            ' FROM (SELECT ' + LEFT(@Statement, LEN(@Statement) -2) +
            ', COUNT(*) AS OverallCount FROM ' + @TABLE_SCHEMA + '.' + @TABLE_NAME + 
            ') SubQuery) PercentageQuery unpivot (NullPercentage for ColumnName in (' + 
            LEFT(@Statement3, LEN(@Statement3) - 1) + 
            ')) u ORDER BY NullPercentage DESC'
    EXEC(@FinalStatement)
END

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM