简体   繁体   中英

How can I identify which rows in a table have met a certain condition, but the condition is based on data in previous rows? Example provided

I'm working with a table that contains the following data:

ObjectId   EventId   EventDate
1          342       2017-10-27
1          342       2018-01-06
1          343       2018-04-18
1          401       2018-10-15
1          342       2018-11-12
1          342       2018-11-29
1          401       2018-12-10
1          342       2019-02-21
1          343       2019-04-23
1          401       2019-11-04
1          343       2020-02-15
2          342       2018-06-08
2          343       2018-09-18
2          342       2018-10-02

I need to flag the first record where all 3 events (identified by EventId values 342, 343, and 401) have occurred for an object (identified by ObjectId). Then, the process should start again with the remaining records. I've tried using windowed functions to get this to work, but the "starting over" process of identifying any additional occurrences is tripping me up.

The output of this algorithm performed on the above data set is:

ObjectId   EventId   EventDate    EventsComplete
1          342       2017-10-27   0
1          342       2018-01-06   0
1          343       2018-04-18   0
1          401       2018-10-15   1
1          342       2018-11-12   0
1          342       2018-11-29   0
1          401       2018-12-10   0
1          342       2019-02-21   0
1          343       2019-04-23   1
1          401       2019-11-04   0
1          343       2020-02-15   0
2          342       2018-06-08   0
2          343       2018-09-18   0
2          342       2018-10-02   0

Here's a query that will create the data set in the example.

select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-01-06' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2018-04-18' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2018-10-15' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-11-12' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-11-29' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2018-12-10' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2019-02-21' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2019-04-23' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2019-11-04' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2020-02-15' as date) as EventDate
union select 2 as ObjectId, 342 as EventId, cast('2018-06-08' as date) as EventDate
union select 2 as ObjectId, 343 as EventId, cast('2018-09-18' as date) as EventDate
union select 2 as ObjectId, 342 as EventId, cast('2018-10-02' as date) as EventDate

The code below demonstrates another way to solve the problem using a CTE. The first phase adds a column ( RN ) to order the data for the next step and several flag columns ( E342Done , ...) to indicate which event the row represents. The second phase uses a recursive CTE to process the rows in the correct order for each ObjectId . Since TSQL isn't very good at implementing boolean logic it is sometimes easier to use arithmetic to "fake" the logic.

-- Sample data.
declare @ObjectEvents as Table ( ObjectId Int, EventId Int, EventDate Date );

insert into @ObjectEvents ( ObjectId, EventId, EventDate ) values
    ( 1, 342, '2017-10-27' ),( 1, 342, '2018-01-06' ),( 1, 343, '2018-04-18' ),( 1, 401, '2018-10-15' ),( 1, 342, '2018-11-12' ),
    ( 1, 342, '2018-11-29' ),( 1, 401, '2018-12-10' ),( 1, 342, '2019-02-21' ),( 1, 343, '2019-04-23' ),( 1, 401, '2019-11-04' ),
    ( 1, 343, '2020-02-15' ),( 2, 342, '2018-06-08' ),( 2, 343, '2018-09-18' ),( 2, 342, '2018-10-02' );

select * from @ObjectEvents order by ObjectId, EventDate;

-- Do the deed.
with
  OrderedEventsByObject as (
    -- Number the rows for each   ObjectId   in   EventDate   order and add flags for the events.
    select ObjectId, EventId, EventDate,
      Row_Number() over ( partition by ObjectId order by EventDate ) as RN,
      case when EventId = 342 then 1 else 0 end as E342Done,
      case when EventId = 343 then 1 else 0 end as E343Done,
      case when EventId = 401 then 1 else 0 end as E401Done
      from @ObjectEvents ),
  ProcessedEvents as (
    -- Process the events in order for each   ObjectId .
    -- Start with the first row for the   ObjectId ...
    select ObjectId, EventId, EventDate, RN, E342Done, E343Done, E401Done,
      0 as EventsComplete
      from OrderedEventsByObject
      where RN = 1
    union all
    -- ... then add the next row, if any, for each   ObjectId :
    select OEBO.ObjectId, OEBO.EventId, OEBO.EventDate, OEBO.RN,
      -- Use arithmetic as a shorthand for: ( PE.E342Done or OEBO.E342Done ) and not PH.EventsComplete .
      Sign( ( PE.E342Done + OEBO.E342Done ) * ( 1 - PH.EventsComplete ) ),
      Sign( ( PE.E343Done + OEBO.E343Done ) * ( 1 - PH.EventsComplete ) ),
      Sign( ( PE.E401Done + OEBO.E401Done ) * ( 1 - PH.EventsComplete ) ),
      PH.EventsComplete
      from ProcessedEvents as PE inner join
        OrderedEventsByObject as OEBO on OEBO.ObjectId = PE.ObjectId and OEBO.RN = PE.RN + 1 cross apply
        -- Use   cross apply   to make the   EventsCompleted   column available within the recursive part of the CTE.
        -- Arithmetic is used again to check for one of every event type being completed.
        ( select case when Sign( PE.E342Done + OEBO.E342Done ) + Sign( PE.E343Done + OEBO.E343Done ) + Sign( PE.E401Done + OEBO.E401Done ) = 3 then 1 else 0 end as EventsComplete ) as PH
     )
  -- You can uncomment the following   select   statements to see the intermediate results:
  -- select * from OrderedEventsByObject;
  -- select * from ProcessedEvents;
  select ObjectId, EventId, EventDate, EventsComplete
    from ProcessedEvents
    order by ObjectId, RN;

There may be a way to do this with a CTE or straight-up SQL, but I wasn't able to come up with an effective solution using either of those.

The best solution I was able to come up with was using a non-cursored resultset of the data to be processed via RBAR (row by agonizing row). It was the only way I could figure on how to manage the current ObjectId's event states.

You can run the following in SSMS:

-- Declare a temporary table for housing the queried data.
DECLARE @Data TABLE ( ObjectId INT, EventId INT, EventDate DATE, EventsComplete BIT DEFAULT (0), pk INT IDENTITY(1,1) );

-- Fetch the queried data into a table variable for processing.
INSERT INTO @Data ( ObjectId, EventId, EventDate ) VALUES
    ( 1, 342, '2017-10-27' ),( 1, 342, '2018-01-06' ),( 1, 343, '2018-04-18' ),( 1, 401, '2018-10-15' ),( 1, 342, '2018-11-12' ),
    ( 1, 342, '2018-11-29' ),( 1, 401, '2018-12-10' ),( 1, 342, '2019-02-21' ),( 1, 343, '2019-04-23' ),( 1, 401, '2019-11-04' ),
    ( 1, 343, '2020-02-15' ),( 2, 342, '2018-06-08' ),( 2, 343, '2018-09-18' ),( 2, 342, '2018-10-02' );

/*
    I'm inserting the sample data you provided, however, in your code you would simply SELECT/INSERT 
    the required data into the temporary table @Data while sorting on your ObjectId and EventDate.
*/

-- Declare some variables for processing.
DECLARE 
    @ObjectId INT, 
    @EventId INT, 
    @PrevObjId INT, 
    @Flag342 BIT,
    @Flag343 BIT,
    @Flag401 BIT;
    
-- For-each row in @Data (non-cursor)...
DECLARE @pk INT = 1;
WHILE @pk <= ( SELECT MAX ( pk ) FROM @Data ) BEGIN

    -- Current row.
    SELECT
        @ObjectId = ObjectId,
        @PrevObjId = ISNULL ( @PrevObjId, ObjectId ),
        @EventId = EventId
    FROM @Data WHERE pk = @pk;

    -- Set the event flags.
    IF @EventId = 342
        SET @Flag342 = 1;

    IF @EventID = 343
        SET @Flag343 = 1;

    IF @EventId = 401
        SET @Flag401 = 1;

    IF @ObjectId = @PrevObjId BEGIN

        -- Check for a completed event.
        IF ( @Flag342 = 1 AND @Flag343 = 1 AND @Flag401 = 1 ) BEGIN

            -- Set the EventsComplete flag.
            UPDATE @Data SET EventsComplete = 1 WHERE pk = @pk;

            -- Reset the event flag values.
            SELECT @Flag342 = 0, @Flag343 = 0, @Flag401 = 0;

        END

    END ELSE BEGIN

        -- New ObjectId, reset the event flag values.
        SELECT 
            @Flag342 = CASE WHEN @EventId = 342 THEN 1 ELSE 0 END, 
            @Flag343 = CASE WHEN @EventId = 343 THEN 1 ELSE 0 END, 
            @Flag401 = CASE WHEN @EventId = 401 THEN 1 ELSE 0 END;

    END

    -- Next row.
    SELECT
        @PrevObjId = @ObjectId,
        @pk = ( @pk + 1 );

END

-- Return the updated resultset.
SELECT
    ObjectId, EventId, EventDate, EventsComplete
FROM @Data ORDER BY pk;

Returns

+----------+---------+------------+----------------+
| ObjectId | EventId | EventDate  | EventsComplete |
+----------+---------+------------+----------------+
|        1 |     342 | 2017-10-27 |              0 |
|        1 |     342 | 2018-01-06 |              0 |
|        1 |     343 | 2018-04-18 |              0 |
|        1 |     401 | 2018-10-15 |              1 |
|        1 |     342 | 2018-11-12 |              0 |
|        1 |     342 | 2018-11-29 |              0 |
|        1 |     401 | 2018-12-10 |              0 |
|        1 |     342 | 2019-02-21 |              0 |
|        1 |     343 | 2019-04-23 |              1 |
|        1 |     401 | 2019-11-04 |              0 |
|        1 |     343 | 2020-02-15 |              0 |
|        2 |     342 | 2018-06-08 |              0 |
|        2 |     343 | 2018-09-18 |              0 |
|        2 |     342 | 2018-10-02 |              0 |
+----------+---------+------------+----------------+

Set based solution below.

No optimisation passes have been attempted other than using a bitfield. It works, that's enough for me. I can see a few points of possible simplification

I should add that, really , this problem is currently undefined, because if two different events can occur on the same date, there is no definition for the order in which we should treat them to have occured. So the row number allocated in the first CTE is arbitrary in those cases. No such cases occur in the sample data.

Using string concatenated paths - 150 ms.

Switching to bits instead of strings, still slower (~30 ms) than the cursor (~15 ms)

select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate
into t
union all select 1, 342, cast('2018-01-06' as date)
union all select 1, 343, cast('2018-04-18' as date)
union all select 1, 401, cast('2018-10-15' as date)
union all select 1, 342, cast('2018-11-12' as date)
union all select 1, 342, cast('2018-11-29' as date)
union all select 1, 401, cast('2018-12-10' as date)
union all select 1, 342, cast('2019-02-21' as date)
union all select 1, 343, cast('2019-04-23' as date)
union all select 1, 401, cast('2019-11-04' as date)
union all select 1, 343, cast('2020-02-15' as date)
union all select 2, 342, cast('2018-06-08' as date)
union all select 2, 343, cast('2018-09-18' as date)
union all select 2, 342, cast('2018-10-02' as date);
go

with numbered as  
-- just adding a row number to make it easier to follow
(
   select   objectid, 
            eventid, 
            eventdate, 
            rn = row_number() over (partition by objectid order by eventdate asc),
            bits = cast(power(2, case eventid when 342 then 0 when 343 then 1 else 2 end) as tinyint)
   from     t
),
paths as  
-- the concatenated paths of distinct eventid for each row, as a bitfield
(
   select      n.objectid, 
               n.eventid, 
               n.eventdate, 
               root = n.rn, 
               n.rn, 
               bits
   from        numbered n
   union all   
   select      n.objectid, 
               n.eventid, 
               n.eventdate, 
               p.root, 
               n.rn, 
               p.bits | n.bits
   from        paths       p
   join        numbered    n  on n.objectid = p.objectid
                                 and n.rn > p.rn
                                 and p.bits & n.bits = 0 
),
candidates as 
-- a row that has a path containing all 3 values (bits = 7)
(
   select   *
   from     (
               select   root, 
                        rn,
                        candidate = iif
                        (
                           rn = min(rn) over (partition by root), 
                           1, 0
                        )
               from     paths
               where    bits = 7
            ) c            
   where    c.candidate = 1
)
-- get the candidate rows where no earlier candidiate in row number order
-- has a root-to-end path which overlaps the path for this candidate
select      distinct 
            n.objectid,
            n.eventid,
            n.eventdate,
            isnull(c.candidate, 0)
from        numbered   n
left join   candidates c on c.rn = n.rn
                            and not exists 
                            (
                               select * 
                               from candidates prev
                               where prev.rn < c.rn
                                     and prev.rn > c.root
                                     and prev.root < c.rn
                            )
order by    n.objectid, 
            n.eventdate, 
            n.eventid

Pure cursor for the lulz.


declare @triplets table(objectid int, eventid int, eventdate date);
declare c cursor fast_forward for 
select objectid, eventid, eventdate from t order by objectid, eventdate asc;
declare 
   @ob int, @prevob int, @event int, @dt date, 
   @bits tinyint = 0;
open c;
fetch next from c into @ob, @event, @dt;
while @@fetch_status = 0
begin
    if (@ob = @prevob)
    begin           
        if @event = 342 set @bits |= 1;
        else if @event = 343 set @bits |= 2;
        else if @event = 401 set @bits |= 4;

        if (@bits = 7) 
        begin
            insert @triplets values (@ob, @event, @dt);
            set @bits = 0
        end
    end
    else select @bits = 0, @prevob = @ob;
    fetch next from c into @ob, @event, @dt;
end
close c;
deallocate c;
select      t.*, iif(tt.objectid is null, 0, 1)
from        t
left join   @triplets tt    on  t.objectid = tt.objectid 
                                and t.eventid = tt.eventid
                                and t.eventdate = tt.eventdate;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM