I'm working with a table that contains the following data:
ObjectId EventId EventDate
1 342 2017-10-27
1 342 2018-01-06
1 343 2018-04-18
1 401 2018-10-15
1 342 2018-11-12
1 342 2018-11-29
1 401 2018-12-10
1 342 2019-02-21
1 343 2019-04-23
1 401 2019-11-04
1 343 2020-02-15
2 342 2018-06-08
2 343 2018-09-18
2 342 2018-10-02
I need to flag the first record where all 3 events (identified by EventId values 342, 343, and 401) have occurred for an object (identified by ObjectId). Then, the process should start again with the remaining records. I've tried using windowed functions to get this to work, but the "starting over" process of identifying any additional occurrences is tripping me up.
The output of this algorithm performed on the above data set is:
ObjectId EventId EventDate EventsComplete
1 342 2017-10-27 0
1 342 2018-01-06 0
1 343 2018-04-18 0
1 401 2018-10-15 1
1 342 2018-11-12 0
1 342 2018-11-29 0
1 401 2018-12-10 0
1 342 2019-02-21 0
1 343 2019-04-23 1
1 401 2019-11-04 0
1 343 2020-02-15 0
2 342 2018-06-08 0
2 343 2018-09-18 0
2 342 2018-10-02 0
Here's a query that will create the data set in the example.
select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-01-06' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2018-04-18' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2018-10-15' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-11-12' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-11-29' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2018-12-10' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2019-02-21' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2019-04-23' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2019-11-04' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2020-02-15' as date) as EventDate
union select 2 as ObjectId, 342 as EventId, cast('2018-06-08' as date) as EventDate
union select 2 as ObjectId, 343 as EventId, cast('2018-09-18' as date) as EventDate
union select 2 as ObjectId, 342 as EventId, cast('2018-10-02' as date) as EventDate
The code below demonstrates another way to solve the problem using a CTE. The first phase adds a column ( RN
) to order the data for the next step and several flag columns ( E342Done
, ...) to indicate which event the row represents. The second phase uses a recursive CTE to process the rows in the correct order for each ObjectId
. Since TSQL isn't very good at implementing boolean logic it is sometimes easier to use arithmetic to "fake" the logic.
-- Sample data.
declare @ObjectEvents as Table ( ObjectId Int, EventId Int, EventDate Date );
insert into @ObjectEvents ( ObjectId, EventId, EventDate ) values
( 1, 342, '2017-10-27' ),( 1, 342, '2018-01-06' ),( 1, 343, '2018-04-18' ),( 1, 401, '2018-10-15' ),( 1, 342, '2018-11-12' ),
( 1, 342, '2018-11-29' ),( 1, 401, '2018-12-10' ),( 1, 342, '2019-02-21' ),( 1, 343, '2019-04-23' ),( 1, 401, '2019-11-04' ),
( 1, 343, '2020-02-15' ),( 2, 342, '2018-06-08' ),( 2, 343, '2018-09-18' ),( 2, 342, '2018-10-02' );
select * from @ObjectEvents order by ObjectId, EventDate;
-- Do the deed.
with
OrderedEventsByObject as (
-- Number the rows for each ObjectId in EventDate order and add flags for the events.
select ObjectId, EventId, EventDate,
Row_Number() over ( partition by ObjectId order by EventDate ) as RN,
case when EventId = 342 then 1 else 0 end as E342Done,
case when EventId = 343 then 1 else 0 end as E343Done,
case when EventId = 401 then 1 else 0 end as E401Done
from @ObjectEvents ),
ProcessedEvents as (
-- Process the events in order for each ObjectId .
-- Start with the first row for the ObjectId ...
select ObjectId, EventId, EventDate, RN, E342Done, E343Done, E401Done,
0 as EventsComplete
from OrderedEventsByObject
where RN = 1
union all
-- ... then add the next row, if any, for each ObjectId :
select OEBO.ObjectId, OEBO.EventId, OEBO.EventDate, OEBO.RN,
-- Use arithmetic as a shorthand for: ( PE.E342Done or OEBO.E342Done ) and not PH.EventsComplete .
Sign( ( PE.E342Done + OEBO.E342Done ) * ( 1 - PH.EventsComplete ) ),
Sign( ( PE.E343Done + OEBO.E343Done ) * ( 1 - PH.EventsComplete ) ),
Sign( ( PE.E401Done + OEBO.E401Done ) * ( 1 - PH.EventsComplete ) ),
PH.EventsComplete
from ProcessedEvents as PE inner join
OrderedEventsByObject as OEBO on OEBO.ObjectId = PE.ObjectId and OEBO.RN = PE.RN + 1 cross apply
-- Use cross apply to make the EventsCompleted column available within the recursive part of the CTE.
-- Arithmetic is used again to check for one of every event type being completed.
( select case when Sign( PE.E342Done + OEBO.E342Done ) + Sign( PE.E343Done + OEBO.E343Done ) + Sign( PE.E401Done + OEBO.E401Done ) = 3 then 1 else 0 end as EventsComplete ) as PH
)
-- You can uncomment the following select statements to see the intermediate results:
-- select * from OrderedEventsByObject;
-- select * from ProcessedEvents;
select ObjectId, EventId, EventDate, EventsComplete
from ProcessedEvents
order by ObjectId, RN;
There may be a way to do this with a CTE or straight-up SQL, but I wasn't able to come up with an effective solution using either of those.
The best solution I was able to come up with was using a non-cursored resultset of the data to be processed via RBAR (row by agonizing row). It was the only way I could figure on how to manage the current ObjectId's event states.
You can run the following in SSMS:
-- Declare a temporary table for housing the queried data.
DECLARE @Data TABLE ( ObjectId INT, EventId INT, EventDate DATE, EventsComplete BIT DEFAULT (0), pk INT IDENTITY(1,1) );
-- Fetch the queried data into a table variable for processing.
INSERT INTO @Data ( ObjectId, EventId, EventDate ) VALUES
( 1, 342, '2017-10-27' ),( 1, 342, '2018-01-06' ),( 1, 343, '2018-04-18' ),( 1, 401, '2018-10-15' ),( 1, 342, '2018-11-12' ),
( 1, 342, '2018-11-29' ),( 1, 401, '2018-12-10' ),( 1, 342, '2019-02-21' ),( 1, 343, '2019-04-23' ),( 1, 401, '2019-11-04' ),
( 1, 343, '2020-02-15' ),( 2, 342, '2018-06-08' ),( 2, 343, '2018-09-18' ),( 2, 342, '2018-10-02' );
/*
I'm inserting the sample data you provided, however, in your code you would simply SELECT/INSERT
the required data into the temporary table @Data while sorting on your ObjectId and EventDate.
*/
-- Declare some variables for processing.
DECLARE
@ObjectId INT,
@EventId INT,
@PrevObjId INT,
@Flag342 BIT,
@Flag343 BIT,
@Flag401 BIT;
-- For-each row in @Data (non-cursor)...
DECLARE @pk INT = 1;
WHILE @pk <= ( SELECT MAX ( pk ) FROM @Data ) BEGIN
-- Current row.
SELECT
@ObjectId = ObjectId,
@PrevObjId = ISNULL ( @PrevObjId, ObjectId ),
@EventId = EventId
FROM @Data WHERE pk = @pk;
-- Set the event flags.
IF @EventId = 342
SET @Flag342 = 1;
IF @EventID = 343
SET @Flag343 = 1;
IF @EventId = 401
SET @Flag401 = 1;
IF @ObjectId = @PrevObjId BEGIN
-- Check for a completed event.
IF ( @Flag342 = 1 AND @Flag343 = 1 AND @Flag401 = 1 ) BEGIN
-- Set the EventsComplete flag.
UPDATE @Data SET EventsComplete = 1 WHERE pk = @pk;
-- Reset the event flag values.
SELECT @Flag342 = 0, @Flag343 = 0, @Flag401 = 0;
END
END ELSE BEGIN
-- New ObjectId, reset the event flag values.
SELECT
@Flag342 = CASE WHEN @EventId = 342 THEN 1 ELSE 0 END,
@Flag343 = CASE WHEN @EventId = 343 THEN 1 ELSE 0 END,
@Flag401 = CASE WHEN @EventId = 401 THEN 1 ELSE 0 END;
END
-- Next row.
SELECT
@PrevObjId = @ObjectId,
@pk = ( @pk + 1 );
END
-- Return the updated resultset.
SELECT
ObjectId, EventId, EventDate, EventsComplete
FROM @Data ORDER BY pk;
Returns
+----------+---------+------------+----------------+
| ObjectId | EventId | EventDate | EventsComplete |
+----------+---------+------------+----------------+
| 1 | 342 | 2017-10-27 | 0 |
| 1 | 342 | 2018-01-06 | 0 |
| 1 | 343 | 2018-04-18 | 0 |
| 1 | 401 | 2018-10-15 | 1 |
| 1 | 342 | 2018-11-12 | 0 |
| 1 | 342 | 2018-11-29 | 0 |
| 1 | 401 | 2018-12-10 | 0 |
| 1 | 342 | 2019-02-21 | 0 |
| 1 | 343 | 2019-04-23 | 1 |
| 1 | 401 | 2019-11-04 | 0 |
| 1 | 343 | 2020-02-15 | 0 |
| 2 | 342 | 2018-06-08 | 0 |
| 2 | 343 | 2018-09-18 | 0 |
| 2 | 342 | 2018-10-02 | 0 |
+----------+---------+------------+----------------+
Set based solution below.
No optimisation passes have been attempted other than using a bitfield. It works, that's enough for me. I can see a few points of possible simplification
I should add that, really , this problem is currently undefined, because if two different events can occur on the same date, there is no definition for the order in which we should treat them to have occured. So the row number allocated in the first CTE is arbitrary in those cases. No such cases occur in the sample data.
Using string concatenated paths - 150 ms.
Switching to bits instead of strings, still slower (~30 ms) than the cursor (~15 ms)
select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate
into t
union all select 1, 342, cast('2018-01-06' as date)
union all select 1, 343, cast('2018-04-18' as date)
union all select 1, 401, cast('2018-10-15' as date)
union all select 1, 342, cast('2018-11-12' as date)
union all select 1, 342, cast('2018-11-29' as date)
union all select 1, 401, cast('2018-12-10' as date)
union all select 1, 342, cast('2019-02-21' as date)
union all select 1, 343, cast('2019-04-23' as date)
union all select 1, 401, cast('2019-11-04' as date)
union all select 1, 343, cast('2020-02-15' as date)
union all select 2, 342, cast('2018-06-08' as date)
union all select 2, 343, cast('2018-09-18' as date)
union all select 2, 342, cast('2018-10-02' as date);
go
with numbered as
-- just adding a row number to make it easier to follow
(
select objectid,
eventid,
eventdate,
rn = row_number() over (partition by objectid order by eventdate asc),
bits = cast(power(2, case eventid when 342 then 0 when 343 then 1 else 2 end) as tinyint)
from t
),
paths as
-- the concatenated paths of distinct eventid for each row, as a bitfield
(
select n.objectid,
n.eventid,
n.eventdate,
root = n.rn,
n.rn,
bits
from numbered n
union all
select n.objectid,
n.eventid,
n.eventdate,
p.root,
n.rn,
p.bits | n.bits
from paths p
join numbered n on n.objectid = p.objectid
and n.rn > p.rn
and p.bits & n.bits = 0
),
candidates as
-- a row that has a path containing all 3 values (bits = 7)
(
select *
from (
select root,
rn,
candidate = iif
(
rn = min(rn) over (partition by root),
1, 0
)
from paths
where bits = 7
) c
where c.candidate = 1
)
-- get the candidate rows where no earlier candidiate in row number order
-- has a root-to-end path which overlaps the path for this candidate
select distinct
n.objectid,
n.eventid,
n.eventdate,
isnull(c.candidate, 0)
from numbered n
left join candidates c on c.rn = n.rn
and not exists
(
select *
from candidates prev
where prev.rn < c.rn
and prev.rn > c.root
and prev.root < c.rn
)
order by n.objectid,
n.eventdate,
n.eventid
Pure cursor for the lulz.
declare @triplets table(objectid int, eventid int, eventdate date);
declare c cursor fast_forward for
select objectid, eventid, eventdate from t order by objectid, eventdate asc;
declare
@ob int, @prevob int, @event int, @dt date,
@bits tinyint = 0;
open c;
fetch next from c into @ob, @event, @dt;
while @@fetch_status = 0
begin
if (@ob = @prevob)
begin
if @event = 342 set @bits |= 1;
else if @event = 343 set @bits |= 2;
else if @event = 401 set @bits |= 4;
if (@bits = 7)
begin
insert @triplets values (@ob, @event, @dt);
set @bits = 0
end
end
else select @bits = 0, @prevob = @ob;
fetch next from c into @ob, @event, @dt;
end
close c;
deallocate c;
select t.*, iif(tt.objectid is null, 0, 1)
from t
left join @triplets tt on t.objectid = tt.objectid
and t.eventid = tt.eventid
and t.eventdate = tt.eventdate;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.