简体   繁体   中英

Identify Consecutive Chunks in SQL Server Table

I have this table:

ValueId bigint // (identity) item ID
ListId bigint // group ID
ValueDelta int // item value
ValueCreated datetime2 // item created

What I need is to find consecutive Values within the same Group ordered by Created, not ID. Created and ID are not guaranteed to be in the same order.

So the output should be:

ListID bigint
FirstId bigint // from this ID (first in LID with Value ordered by Date)
LastId bigint // to this ID (last in LID with Value ordered by Date)
ValueDelta int // all share this value
ValueCount // and this many occurrences (number of items between FirstId and LastId)

I can do this with Cursors but I'm sure that's not the best idea so I'm wondering if this can be done in a query.

Please, for the answer (if any) , explain it a bit.

UPDATE : SQLfiddle basic data set

Use a CTE that adds a Row_Number column, partitioned by GroupId and Value and ordered by Created .

Then select from the CTE, GROUP BY GroupId and Value ; use COUNT(*) to get the Count , and use correlated subqueries to select the ValueId with the MIN(RowNumber) (which will always be 1, so you can just use that instead of MIN) and the MAX(RowNumber) to get FirstId and LastId .

Although, now that I've noticed you're using SQL Server 2017, you should be able to use First_Value() and Last_Value() instead of correlated subqueries.

It does look like a gaps-and-island problem.

Here is one way to do it. It would likely work faster than your variant.

The standard idea for gaps-and-islands is to generate two sets of row numbers partitioning them in two ways. The difference between such row numbers ( rn1-rn2 ) would remain the same within each consecutive chunk. Run the query below CTE-by-CTE and examine intermediate results to see what is going on.

WITH
CTE_RN
AS
(
    SELECT
        [ValueId]
        ,[ListId]
        ,[ValueDelta]
        ,[ValueCreated]
        ,ROW_NUMBER() OVER (PARTITION BY ListID ORDER BY ValueCreated) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY ListID, [ValueDelta] ORDER BY ValueCreated) AS rn2
    FROM [Value]
)
SELECT
    ListID
    ,MIN(ValueID) AS FirstID
    ,MAX(ValueID) AS LastID
    ,MIN(ValueCreated) AS FirstCreated
    ,MAX(ValueCreated) AS LastCreated
    ,ValueDelta
    ,COUNT(*) AS ValueCount
FROM CTE_RN
GROUP BY
    ListID
    ,ValueDelta
    ,rn1-rn2
ORDER BY
    FirstCreated
;

This query produces the same result as yours on your sample data set.

It is not quite clear whether FirstID and LastID can be MIN and MAX , or they indeed must be from the first and last rows (when ordered by ValueCreated). If you need really first and last, the query would become a bit more complicated.


In your original sample data set the "first" and "min" for the FirstID are the same. Let's change the sample data set a little to highlight this difference:

insert into [Value]
([ListId], [ValueDelta], [ValueCreated])
values
(1, 1, '2019-01-01 01:01:02'), -- 1.1
(1, 0, '2019-01-01 01:02:01'), -- 2.1
(1, 0, '2019-01-01 01:03:01'), -- 2.2
(1, 0, '2019-01-01 01:04:01'), -- 2.3
(1, -1, '2019-01-01 01:05:01'), -- 3.1
(1, -1, '2019-01-01 01:06:01'), -- 3.2
(1, 1, '2019-01-01 01:01:01'), -- 1.2
(1, 1, '2019-01-01 01:08:01'), -- 4.2
(2, 1, '2019-01-01 01:08:01') -- 5.1
;

All I did is swapped the ValueCreated between the first and seventh rows, so now the FirstID of the first group is 7 and LastID is 1 . Your query returns correct result. My simple query above doesn't.

Here is the variant that produces correct result. I decided to use FIRST_VALUE and LAST_VALUE functions to get the appropriate IDs. Again, run the query CTE-by-CTE and examine intermediate results to see what is going on. This variant produces the same result as your query even with the adjusted sample data set.

WITH
CTE_RN
AS
(
    SELECT
        [ValueId]
        ,[ListId]
        ,[ValueDelta]
        ,[ValueCreated]
        ,ROW_NUMBER() OVER (PARTITION BY ListID ORDER BY ValueCreated) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY ListID, ValueDelta ORDER BY ValueCreated) AS rn2
    FROM [Value]
)
,CTE2
AS
(
    SELECT
        ValueId
        ,ListId
        ,ValueDelta
        ,ValueCreated
        ,rn1
        ,rn2
        ,rn1-rn2 AS Diff
        ,FIRST_VALUE(ValueID) OVER(
            PARTITION BY ListID, ValueDelta, rn1-rn2 ORDER BY ValueCreated
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS FirstID
        ,LAST_VALUE(ValueID) OVER(
            PARTITION BY ListID, ValueDelta, rn1-rn2 ORDER BY ValueCreated
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastID
    FROM CTE_RN
)
SELECT
    ListID
    ,FirstID
    ,LastID
    ,MIN(ValueCreated) AS FirstCreated
    ,MAX(ValueCreated) AS LastCreated
    ,ValueDelta
    ,COUNT(*) AS ValueCount
FROM CTE2
GROUP BY
    ListID
    ,ValueDelta
    ,rn1-rn2
    ,FirstID
    ,LastID
ORDER BY FirstCreated;

After many iterations I think I have a working solution. I'm absolutely sure it's far from optimal but it works.

Link is here : http://sqlfiddle.com/#!18/4ee9f/3

Sample data:

create table [Value]
(
    [ValueId] bigint not null identity(1,1),
    [ListId] bigint not null,
    [ValueDelta] int not null,
    [ValueCreated] datetime2 not null,
    constraint [PK_Value] primary key clustered ([ValueId])
);

insert into [Value]
([ListId], [ValueDelta], [ValueCreated])
values
(1, 1, '2019-01-01 01:01:01'), -- 1.1
(1, 0, '2019-01-01 01:02:01'), -- 2.1
(1, 0, '2019-01-01 01:03:01'), -- 2.2
(1, 0, '2019-01-01 01:04:01'), -- 2.3
(1, -1, '2019-01-01 01:05:01'), -- 3.1
(1, -1, '2019-01-01 01:06:01'), -- 3.2
(1, 1, '2019-01-01 01:01:02'), -- 1.2
(1, 1, '2019-01-01 01:08:01'), -- 4.2
(2, 1, '2019-01-01 01:08:01') -- 5.1

The Query that seems to work:

-- this is the actual order of data
select *
from [Value]
order by [ListId] asc, [ValueCreated] asc;

-- there are 4 sets here
-- set 1 GroupId=1, Id=1&7, Value=1
-- set 2 GroupId=1, Id=2-4, Value=0
-- set 3 GroupId=1, Id=5-6, Value=-1
-- set 4 GroupId=1, Id=8-8, Value=1
-- set 5 GroupId=2, Id=9-9, Value=1

with [cte1] as
(
    select [v1].[ListId]
        ,[v2].[ValueId] as [FirstId], [v2].[ValueCreated] as [FirstCreated]
        ,[v1].[ValueId] as [LastId], [v1].[ValueCreated] as [LastCreated]
        ,isnull([v1].[ValueDelta], 0) as [ValueDelta]
    from [dbo].[Value] [v1]
        join [dbo].[Value] [v2] on [v2].[ListId] = [v1].[ListId]
            and isnull([v2].[ValueDeltaPrev], 0) = isnull([v1].[ValueDeltaPrev], 0)
            and [v2].[ValueCreated] <= [v1].[ValueCreated] and not exists (
                select 1
                from [dbo].[Value] [v3]
                where 1=1
                    and ([v3].[ListId] = [v1].[ListId])
                    and ([v3].[ValueCreated] between [v2].[ValueCreated] and [v1].[ValueCreated])
                    and [v3].[ValueDelta] != [v1].[ValueDelta]
            )
), [cte2] as
(
    select [t1].*
    from [cte1] [t1]
    where not exists (select 1 from [cte1] [t2] where [t2].[ListId] = [t1].[ListId]
        and ([t1].[FirstId] != [t2].[FirstId] or [t1].[LastId] != [t2].[LastId])
        and [t1].[FirstCreated] between [t2].[FirstCreated] and [t2].[LastCreated]
        and [t1].[LastCreated] between [t2].[FirstCreated] and [t2].[LastCreated]
        )
)
select [ListId], [FirstId], [LastId], [FirstCreated], [LastCreated], [ValueDelta] as [ValueDelta]
    ,(select count(*) from [dbo].[Value] where [ListId] = [t].[ListId] and [ValueCreated] between [t].[FirstCreated] and [t].[LastCreated]) as [ValueCount]
from [cte2] [t];

How it works:

  • join table to self on same list but only on older (or equal date to handle single sets) values
  • join again on self and exclude any overlaps keeping only largest date set
  • once we identify largest sets, we then count entries in set dates

If anyone can find a better / friendlier solution, you get the answer.

PS : The dumb straightforward Cursor approach seems a lot faster than this. Still testing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM