简体   繁体   中英

Identify and categorize breaks in repeated series of 0 and 1 using SQL

I am interested in breaking PID into ordered chunks based on the CID column. Please note that OID is used to Order each PID subgroup

The GRP should be sequential and should start at 0 if the first CID starts with 0, while the GRP should start at 1 if GRP starts with 1

PID         OID         CID         GRP
----------- ----------- ----------- -----------
1           1           0           NULL
1           2           0           NULL
1           3           1           NULL
1           4           1           NULL
1           5           1           NULL
1           6           0           NULL
1           7           0           NULL
2           1           0           NULL
2           2           0           NULL
2           3           1           NULL
2           4           1           NULL
2           5           0           NULL
2           6           1           NULL
2           7           1           NULL
2           8           0           NULL
3           1           1           NULL
3           2           1           NULL
3           3           1           NULL
3           4           1           NULL
3           5           0           NULL
3           6           1           NULL
3           7           1           NULL
3           8           0           NULL
3           9           0           NULL


A solution to the above table is given below, I would like to solve the following using T-SQL, however, I am not even sure if this kind of task might be possible. I am essentially trying to identify breaks in repetition in column CID, once a break is identified, I would like to increase the GRP indicator

PID         OID         CID         GRP
----------- ----------- ----------- -----------
1           1           0           0
1           2           0           0
1           3           1           1
1           4           1           1
1           5           1           1
1           6           0           2
1           7           0           2
2           1           0           0
2           2           0           0
2           3           1           1
2           4           1           1
2           5           0           2
2           6           1           3
2           7           1           3
2           8           0           4
3           1           1           1
3           2           1           1
3           3           1           1
3           4           1           1
3           5           0           2
3           6           1           3
3           7           1           3
3           8           0           4
3           9           0           4

Here is sample code to create the above table:

CREATE TABLE ##SOLUTION (PID INT, OID INT,  CID INT, GRP INT)
insert into ##SOLUTION values 
(1,1,0, 0), 
(1,2,0, 0), 
(1,3,1, 1), 
(1,4,1, 1), 
(1,5,1, 1), 
(1,6,0, 2), 
(1,7,0, 2), 

(2,1,0, 0), 
(2,2,0, 0), 
(2,3,1, 1), 
(2,4,1, 1), 
(2,5,0, 2), 
(2,6,1, 3), 
(2,7,1, 3), 
(2,8,0, 4), 

(3,1,1, 1), 
(3,2,1, 1), 
(3,3,1, 1), 
(3,4,1, 1), 
(3,5,0, 2), 
(3,6,1, 3), 
(3,7,1, 3), 
(3,8,0, 4), 
(3,9,0, 4)

Thank you in advance

You can use lag() to get the previous value (within each pid ). Then a conditional cumulative sum create the grouping you want:

select s.*,
       sum(case when prev_cid = cid then 0 else 1 end) over (partition by pid order by oid) as grp
from (select s.*,
             lag(cid) over (partition by pid order by oid) as prev_cid
      from solutions s
     ) s

Here is a db<>fiddle.

I should add that SQL Server makes it really easy to update these values using updatable CTEs:

with toupdate as (
      select s.*,
             sum(case when prev_cid = cid then 0 else 1 end) over (partition by pid order by oid) as new_grp
      from (select s.*,
                   lag(cid) over (partition by pid order by oid) as prev_cid
            from solution s
           ) s
     )
update toupdate
     set grp = new_grp;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM