Piggybacking this lovely question: Partition Function COUNT() OVER possible using DISTINCT
I wish to calculate a moving count of distinct value. Something along the lines of:
Count(distinct machine_id) over(partition by model order by _timestamp rows between 6 preceding and current row)
Obviously, SQL Server does not support the syntax. Unfortunately, I don't understand well enough (didn't internalize would be more accurate) how that dense_rank walk-around works:
dense_rank() over (partition by model order by machine_id)
+ dense_rank() over (partition by model order by machine_id)
- 1
and therefore I am not able tweak it to meet my need for a moving window. If I order by machine_id, would it be enough to order by _timestamp as well and use rows between
?
dense_rank()
gives the dense ranking of the the current record. When you run that with ASC
sort order first, you get the current record's dense rank (unique value rank) from the first element. When you run with DESC
order, then you get the current record's dense rank from the last record. Then you remove 1 because the dense ranking of the current record is counted twice. This gives the total unique values in the whole partition (and repeated for every row).
Since, dense_rank
does not support frames
, you can't use this solution directly. You need to generate the frame
by other means. One way could be JOIN
ing the same table with proper unique id
comparisons. Then, you can use dense_rank
on the combined version.
Please check out the following solution proposal. The assumption there is you have a unique record key ( record_id
) available in your table. If you don't have a unique key, add another CTE before the first CTE and generate a unique key for each record (using new_id()
function OR combining multiple columns using concat()
with delimiter in between to account for NULLs
)
; WITH cte AS (
SELECT
record_id
, record_id_6_record_earlier = LEAD(machine_id, 6, NULL) OVER (PARTITION BY model ORDER BY _timestamp)
, .... other columns
FROM mainTable
)
, cte2 AS (
SELECT
c.*
, DistinctCntWithin6PriorRec = dense_rank() OVER (PARTITION BY c.model, c.record_id ORDER BY t._timestamp)
+ dense_rank() OVER (PARTITION BY c.model, c.record_id ORDER BY t._timestamp DESC)
- 1
, RN = ROW_NUMBER() OVER (PARTITION BY c.record_id ORDER BY t._timestamp )
FROM cte c
LEFT JOIN mainTable t ON t.record_id BETWEEN c.record_id_6_record_earlier and c.record_id
)
SELECT *
FROM cte2
WHERE RN = 1
There are 2 LIMITATIONS of this solution:
If the frame has less than 6 records, then the LAG()
function will be NULL
and thus this solution will not work. This can be handled in different ways: One quick way I can think of is to generate 6 LEAD columns (1 record prior, 2 records prior, etc.) and then change the BETWEEN
clause to something like this BETWEEN COALESCE(c.record_id_6_record_earlier, c.record_id_5_record_earlier, ...., c.record_id_1_record_earlier, c.record_id) and c.record_id
COUNT()
does not count NULL
. But DENSE_RANK
does. You need account for that too if it applies to your data
Just use outer apply
:
select t.*, t2.num_machines
from t outer apply
(select count(distinct t2.machine_id) as num_machines
from (select top (6) t2.*
from t t2
where t2.model = t.model and
t2.timestamp <= t.timestamp
order by t2.timestamp desc
) t2
) t2;
If you have a lot of rows per model, you can also use a (cumbersome) trick using lag()
:
select t.*, v.num_machines
from (select t.*,
lag(machine_id, 1) over (partition by model order by timestamp) as machine_id_1,
lag(machine_id, 2) over (partition by model order by timestamp) as machine_id_2,
lag(machine_id, 3) over (partition by model order by timestamp) as machine_id_3,
lag(machine_id, 4) over (partition by model order by timestamp) as machine_id_4,
lag(machine_id, 5) over (partition by model order by timestamp) as machine_id_5
from t
) t cross apply
(select count(distinct v.machine_id) as num_machines
from (values (t.machine_id),
(t.machine_id_1),
(t.machine_id_2),
(t.machine_id_3),
(t.machine_id_4),
(t.machine_id_5)
) v(machine_id)
) v;
Under many circumstances, this might have the best performance in SQL Server.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.