I want to write a TSQL query that independently checks a set of columns in a table to see which ones contain at least one non-null value. Each column's check should return T/F (1/0) accordingly.
The first thing that came to mind was to use the COUNT
aggregate function. Since COUNT(expression)
excludes nulls from the resulting total, if the COUNT
is > 0, there's non-null data.
This seems a bit heavy-handed in that it has to count all data. I really just need to know if there's at least one non-null value in each column:
SELECT
CAST(CASE WHEN COUNT(t.Column1) > 0 THEN 1 ELSE 0 END AS BIT) AS HasColumn1Data,
CAST(CASE WHEN COUNT(t.Column2) > 0 THEN 1 ELSE 0 END AS BIT) AS HasColumn2Data,
CAST(CASE WHEN COUNT(t.Column3) > 0 THEN 1 ELSE 0 END AS BIT) AS HasColumn3Data,
CAST(CASE WHEN COUNT(t.Column4) > 0 THEN 1 ELSE 0 END AS BIT) AS HasColumn4Data
FROM dbo.Table AS t
WHERE t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp
Any ideas that might be more optimal?
If you have indexes on the columns, the following may be faster:
select (case when exists (select 1
from table t
where t.TimeStamp BETWEEN @StartTimeStamp and @EndTimeStamp and
column1 is not null
)
then 1 else 0 end) as HasColumn1Data,
(case when exists (select 1
from table t
where t.TimeStamp BETWEEN @StartTimeStamp and @EndTimeStamp and
column2 is not null
)
then 1 else 0 end) as HasColumn2Data,
(case when exists (select 1
from table t
where t.TimeStamp BETWEEN @StartTimeStamp and @EndTimeStamp and
column3 is not null
)
then 1 else 0 end) as HasColumn3Data,
(case when exists (select 1
from table t
where t.TimeStamp BETWEEN @StartTimeStamp and @EndTimeStamp and
column4 is not null
)
then 1 else 0 end) as HasColumn4Data;
Without indexes, this would be about 4 full-table scans (admittedly, truncated at the first non-NULL value), so it would probably be slower than a group by
This may end up being more cumbersome, but using an EXISTS
instead of a COUNT
may be more optimal:
SELECT CAST(CASE WHEN EXISTS(SELECT * FROM Table t WHERE t.Column1 IS NOT NULL AND t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp) THEN 1 ELSE 0 END AS BIT) AS HasColumn1Data,
CAST(CASE WHEN EXISTS(SELECT * FROM Table t WHERE t.Column2 IS NOT NULL AND t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp) THEN 1 ELSE 0 END AS BIT) AS HasColumn2Data,
CAST(CASE WHEN EXISTS(SELECT * FROM Table t WHERE t.Column3 IS NOT NULL AND t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp) THEN 1 ELSE 0 END AS BIT) AS HasColumn3Data,
CAST(CASE WHEN EXISTS(SELECT * FROM Table t WHERE t.Column4 IS NOT NULL AND t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp) THEN 1 ELSE 0 END AS BIT) AS HasColumn4Data
I would pivot the query to produce only field names and whether they're non-null:
SELECT
'COL1' AS column_name,
CONVERT( BIT, COUNT( 1 ) ) AS is_not_entirely_null
FROM
foo
WHERE
column1 IS NOT NULL
UNION
SELECT
'COL2' AS column_name,
CONVERT( BIT, COUNT( 1 ) ) AS is_not_entirely_null
FROM
foo
WHERE
column2 IS NOT NULL
...
BTW, you should be able to auto-generate the above query with something like this:
SELECT
'SELECT ''' + c.name + ''' AS column_name, CONVERT( BIT, COUNT( 1 ) ) AS is_not_entirely_null FROM ' + t.name + ' WHERE ' + c.name + ' IS NOT NULL UNION'
FROM
sysobjects AS t,
syscolumns AS c
WHERE
t.name = 'foo' AND
c.id = t.id
You could use this query
SELECT
max(CASE WHEN t.Column1 IS NULL THEN 0 ELSE 1 END ) AS HasColumn1Data,
max(CASE WHEN t.Column2 IS NULL THEN 0 ELSE 1 END ) AS HasColumn2Data,
max(CASE WHEN t.Column3 IS NULL THEN 0 ELSE 1 END ) AS HasColumn3Data,
max(CASE WHEN t.Column4 IS NULL THEN 0 ELSE 1 END ) AS HasColumn4Data,
FROM dbo.Table AS t
WHERE t.TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp
You could try something like this:
;WITH cte AS (
SELECT * FROM dbo.Table WHERE TimeStamp BETWEEN @StartTimeStamp AND @EndTimeStamp
)
SELECT COUNT(s1.Col1) as Col1, COUNT(s2.Col2) as Col2,
COUNT(s3.Col3) as Col3, COUNT(s4.Col4) as Col4
FROM
(SELECT TOP 1 Col1
FROM cte
WHERE Col1 IS NOT NULL) s1 CROSS JOIN
(SELECT TOP 1 Col2
FROM cte
WHERE Col2 IS NOT NULL) s2 CROSS JOIN
(SELECT TOP 1 Col3
FROM cte
WHERE Col3 IS NOT NULL) s3 CROSS JOIN
(SELECT TOP 1 Col4
FROM cte
WHERE Col4 IS NOT NULL) s4
This has a potential advantage IF all columns are not null. In such case, the table is only scanned till the first non null row (But doing so 4 times...). If any (or worse, all) column is null for all rows, you'll get a full scan per column. To summarize, this might be useful if your expected data does have values.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.