I have an Audit table where we record changes to fields in our database. I have a query where I was able to get a subset of the data from the Audit regarding a few columns, their recorded change, and when, associated against the applicable ID's. Here is a sample of what the output looks like:
ID ada IsHD HDF DTStamp
-----------------------------------------------------
68 NULL 0 0 2020-04-28 21:12:21.287
68 NULL NULL NULL 2020-04-17 14:59:49.700
68 No/Unsure NULL NULL 2020-04-17 14:03:46.160
68 NULL 0 0 2020-04-17 13:49:49.720
102 NULL NULL NULL 2020-04-30 13:11:15.273
102 No/Unsure NULL NULL 2020-04-20 16:00:35.410
102 NULL 1 1 2020-04-20 15:59:55.750
105 No/Unsure 1 1 2020-04-17 12:06:10.833
105 NULL NULL NULL 2020-04-13 07:51:30.180
126 NULL NULL NULL 2020-05-01 17:59:24.460
126 NULL 0 0 2020-04-28 21:12:21.287
What I am trying to figure out is the most efficient means to "roll-up" the multiple rows of a given ID so that the newest Non-NULL value is kept, leaving only a single line for that ID.
That is, turn this:
68 NULL 0 0 2020-04-28 21:12:21.287
68 NULL NULL NULL 2020-04-17 14:59:49.700
68 No/Unsure NULL NULL 2020-04-17 14:03:46.160
68 NULL 0 0 2020-04-17 13:49:49.720
102 NULL NULL NULL 2020-04-30 13:11:15.273
102 No/Unsure NULL NULL 2020-04-20 16:00:35.410
102 NULL 1 1 2020-04-20 15:59:55.750
Into this:
68 No/Unsure 0 0 2020-04-28 21:12:21.287
102 No/Unsure 1 1 2020-04-30 13:11:15.273
...and so on down the list. It's almost like you were to push down on the top of the results and squeeze out all the NULLs, as it were.
Dumping the above results into a temp table @audit
I then run the following query:
SELECT DISTINCT a.[ID]
, (SELECT TOP 1 [ADA]
FROM @audit
WHERE [ID] = a.[ID]
AND [ADA] IS NOT NULL
ORDER BY [DTStamp] DESC) AS 'ADA'
, (SELECT TOP 1 [IsHD]
FROM @audit
WHERE [ID] = a.[ID]
AND [IsHD] IS NOT NULL
ORDER BY [DTStamp] DESC) AS 'IsHD'
, (SELECT TOP 1 [HDF]
FROM @audit
WHERE [ID] = a.[ID]
AND [HDF] IS NOT NULL
ORDER BY [DTStamp] DESC) AS 'HDF'
, (SELECT Max([DTStamp])
FROM @audit
WHERE [ID] = a.[ID]) AS 'DTStamp'
FROM @audit a
ORDER BY [ID]
This is what I've come up with and it does work, but it feels very klunky and inefficient. Is there a better way to accomplish the end goal?
If you want one row per id, then use aggregation:
select id, max(ada), max(IsHD), max(HDF), max(DTStamp)
from @audit a
group by id;
This works for the data you have provided and seems to fit the rule that you want.
I understand that you want the "latest" non-null value per id
for each column, using column DTStamp
for ordering.
Your approach using multiple subqueries does what you want would. An alternative be to use multiple row_number()
s and conditional aggregation. This might actually be more efficient, since it avoids multiple scans on the table.
select
id,
max(case when rn_ada = 1 then ada end) ada,
max(case when rn_isHd = 1 then isHd end) isHd,
max(case when rn_hdf = 1 then hdf end) hdf,
max(DTStamp) DTStamp
from (
select
a.*,
row_number() over(
partition by id
order by case when ada is not null then DTStamp end desc
) rn_ada,
row_number() over(
partition by id
order by case when isHd is not null then DTStamp end desc
) rn_isHd,
row_number() over(
partition by id
order by case when hdf is not null then DTStamp end desc
) rn_hdf
from @audit a
) t
group by id
order by id
id | ada | isHd | hdf | DTStamp --: | :-------- | ---: | --: | :---------------------- 68 | No/Unsure | 0 | 0 | 2020-04-28 21:12:21.287 102 | No/Unsure | 1 | 1 | 2020-04-30 13:11:15.273
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.