简体   繁体   中英

TSQL: Check columns for at least one non-null value

I want to write a TSQL query that independently checks a set of columns in a table to see which ones contain at least one non-null value. Each column's check should return T/F (1/0) accordingly.

The first thing that came to mind was to use the COUNT aggregate function. Since COUNT(expression) excludes nulls from the resulting total, if the COUNT is > 0, there's non-null data.

This seems a bit heavy-handed in that it has to count all data. I really just need to know if there's at least one non-null value in each column:

    SELECT 
        CAST(CASE WHEN COUNT(t.Column1) > 0 THEN 1 ELSE 0 END AS BIT) AS HasColumn1Data,
        CAST(CASE WHEN COUNT(t.Column2) > 0 THEN 1 ELSE 0 END AS BIT) AS HasColumn2Data,
        CAST(CASE WHEN COUNT(t.Column3) > 0 THEN 1 ELSE 0 END AS BIT) AS HasColumn3Data,
        CAST(CASE WHEN COUNT(t.Column4) > 0 THEN 1 ELSE 0 END AS BIT) AS HasColumn4Data
    FROM dbo.Table AS t
    WHERE t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp

Any ideas that might be more optimal?

If you have indexes on the columns, the following may be faster:

select (case when exists (select 1
                          from table t
                          where t.TimeStamp BETWEEN @StartTimeStamp and @EndTimeStamp and
                                column1 is not null
                         )
             then 1 else 0 end) as HasColumn1Data,
       (case when exists (select 1
                          from table t
                          where t.TimeStamp BETWEEN @StartTimeStamp and @EndTimeStamp and
                                column2 is not null
                         )
             then 1 else 0 end) as HasColumn2Data,
       (case when exists (select 1
                          from table t
                          where t.TimeStamp BETWEEN @StartTimeStamp and @EndTimeStamp and
                                column3 is not null
                         )
             then 1 else 0 end) as HasColumn3Data,
       (case when exists (select 1
                          from table t
                          where t.TimeStamp BETWEEN @StartTimeStamp and @EndTimeStamp and
                                column4 is not null
                         )
             then 1 else 0 end) as HasColumn4Data;

Without indexes, this would be about 4 full-table scans (admittedly, truncated at the first non-NULL value), so it would probably be slower than a group by

This may end up being more cumbersome, but using an EXISTS instead of a COUNT may be more optimal:

SELECT  CAST(CASE WHEN EXISTS(SELECT * FROM Table t WHERE t.Column1 IS NOT NULL AND t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp) THEN 1 ELSE 0 END AS BIT) AS HasColumn1Data,
        CAST(CASE WHEN EXISTS(SELECT * FROM Table t WHERE t.Column2 IS NOT NULL AND t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp) THEN 1 ELSE 0 END AS BIT) AS HasColumn2Data,
        CAST(CASE WHEN EXISTS(SELECT * FROM Table t WHERE t.Column3 IS NOT NULL AND t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp) THEN 1 ELSE 0 END AS BIT) AS HasColumn3Data,
        CAST(CASE WHEN EXISTS(SELECT * FROM Table t WHERE t.Column4 IS NOT NULL AND t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp) THEN 1 ELSE 0 END AS BIT) AS HasColumn4Data

I would pivot the query to produce only field names and whether they're non-null:

SELECT
  'COL1' AS column_name,
  CONVERT( BIT, COUNT( 1 ) ) AS is_not_entirely_null
FROM
  foo
WHERE
  column1 IS NOT NULL
UNION
SELECT
  'COL2' AS column_name,
  CONVERT( BIT, COUNT( 1 ) ) AS is_not_entirely_null
FROM
  foo
WHERE
  column2 IS NOT NULL

...

BTW, you should be able to auto-generate the above query with something like this:

SELECT
  'SELECT ''' + c.name + ''' AS column_name, CONVERT( BIT, COUNT( 1 ) ) AS is_not_entirely_null FROM ' + t.name + ' WHERE ' + c.name + ' IS NOT NULL UNION'
FROM 
  sysobjects AS t,
  syscolumns AS c
WHERE
  t.name = 'foo' AND
  c.id = t.id

You could use this query

SELECT
max(CASE WHEN t.Column1 IS NULL THEN 0 ELSE 1 END ) AS HasColumn1Data,
max(CASE WHEN t.Column2 IS NULL THEN 0 ELSE 1 END ) AS HasColumn2Data,
max(CASE WHEN t.Column3 IS NULL THEN 0 ELSE 1 END ) AS HasColumn3Data,
max(CASE WHEN t.Column4 IS NULL THEN 0 ELSE 1 END ) AS HasColumn4Data,
FROM dbo.Table AS t
WHERE t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp

You could try something like this:

;WITH cte AS (
  SELECT * FROM dbo.Table WHERE TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp
)
SELECT COUNT(s1.Col1) as Col1, COUNT(s2.Col2) as Col2,
  COUNT(s3.Col3) as Col3, COUNT(s4.Col4) as Col4
FROM
  (SELECT TOP 1 Col1
   FROM cte
   WHERE Col1 IS NOT NULL) s1 CROSS JOIN
  (SELECT TOP 1 Col2
   FROM cte
   WHERE Col2 IS NOT NULL) s2 CROSS JOIN
  (SELECT TOP 1 Col3
   FROM cte
   WHERE Col3 IS NOT NULL) s3 CROSS JOIN
  (SELECT TOP 1 Col4
   FROM cte
   WHERE Col4 IS NOT NULL) s4

This has a potential advantage IF all columns are not null. In such case, the table is only scanned till the first non null row (But doing so 4 times...). If any (or worse, all) column is null for all rows, you'll get a full scan per column. To summarize, this might be useful if your expected data does have values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM