简体   繁体   中英

Finding overlapping timestamps in query

I have an issue with removing or tagging overlapping timestamps grouped by certain ID.

Times can overlap in nest and may have same start time or end time.
If second time starts before previous time has ended it will end before or at the same time as previous time. No time differences will go over 12 hours.

Using T-SQL.

Sample data:

ID  task_id starttime                       endtime
11  1       2023-01-10 06:31:00.000         2023-01-10 08:53:00.000
11  1       2023-01-10 08:00:00.000         2023-01-10 08:53:00.000
11  2       2023-01-10 13:14:00.000         2023-01-10 15:15:00.000
11  2       2023-01-10 15:46:00.000         2023-01-10 17:59:00.000
11  2       2023-01-10 18:49:00.000         2023-01-10 18:50:00.000
12  3       2023-01-09 10:10:00.000         2023-01-09 11:10:00.000
12  3       2023-01-09 10:10:00.000         2023-01-09 10:50:00.000
13  4       2023-01-08 20:00:00.000         2023-01-09 03:44:00.000
13  4       2023-01-08 21:00:00.000         2023-01-09 02:00:00.000
14  5       2023-01-01 19:23:00.000         2023-01-01 20:47:00.000
14  5       2023-01-02 03:35:00.000         2023-01-02 06:57:00.000

Desired result:

ID  task_id starttime                       endtime
11  1       2023-01-10 06:31:00.000         2023-01-10 08:53:00.000
11  2       2023-01-10 13:14:00.000         2023-01-10 15:15:00.000
11  2       2023-01-10 15:46:00.000         2023-01-10 17:59:00.000
11  2       2023-01-10 18:49:00.000         2023-01-10 18:50:00.000
12  3       2023-01-09 10:10:00.000         2023-01-09 11:10:00.000
13  4       2023-01-08 20:00:00.000         2023-01-09 03:44:00.000
14  5       2023-01-01 19:23:00.000         2023-01-01 20:47:00.000
14  5       2023-01-02 03:35:00.000         2023-01-02 06:57:00.000

I've tried methods with lead or lag functions but it doesn't seem to play well with edge cases. For example:

case when lead(starttime) over (partition by task_id order by starttime) <> endtime then 1 else 0 end as overlap_tag

Doesn't count the time in ID 11 task_id 2 from 18:49-18:50 as not overlapping and doesn't seem to take into account the day changing.

I only tested it on PostgreSQL, but it might help.


Conditions

Preparation

CREATE TABLE task_duration (
    id INTEGER, 
    task_id INTEGER,
    start_time TIMESTAMP, 
    end_time TIMESTAMP
);

INSERT INTO task_duration VALUES (11, 1, '2023-01-10 06:31:00.000', '2023-01-10 08:53:00.000');
INSERT INTO task_duration VALUES (11, 1, '2023-01-10 08:00:00.000', '2023-01-10 08:53:00.000');
INSERT INTO task_duration VALUES (11, 2, '2023-01-10 13:14:00.000', '2023-01-10 15:15:00.000');
INSERT INTO task_duration VALUES (11, 2, '2023-01-10 15:46:00.000', '2023-01-10 17:59:00.000');
INSERT INTO task_duration VALUES (11, 2, '2023-01-10 18:49:00.000', '2023-01-10 18:50:00.000');
INSERT INTO task_duration VALUES (12, 3, '2023-01-09 10:10:00.000', '2023-01-09 11:10:00.000');
INSERT INTO task_duration VALUES (12, 3, '2023-01-09 10:10:00.000', '2023-01-09 10:50:00.000');
INSERT INTO task_duration VALUES (13, 4, '2023-01-08 20:00:00.000', '2023-01-09 03:44:00.000');
INSERT INTO task_duration VALUES (13, 4, '2023-01-08 21:00:00.000', '2023-01-09 02:00:00.000');
INSERT INTO task_duration VALUES (14, 5, '2023-01-01 19:23:00.000', '2023-01-01 20:47:00.000');
INSERT INTO task_duration VALUES (14, 5, '2023-01-02 03:35:00.000', '2023-01-02 06:57:00.000');

Query

SELECT id, 
    task_id, 
    start_time, 
    end_time
FROM (
    SELECT id, 
        task_id, 
        start_time, 
        end_time, 
        LAG(start_time) OVER (PARTITION BY task_id ORDER BY task_id, start_time, end_time DESC) AS prev_start_time, 
        LAG(end_time) OVER (PARTITION BY task_id ORDER BY task_id, start_time, end_time DESC) AS prev_end_time
    FROM task_duration
) v
WHERE prev_start_time IS NULL  -- 1st condition
    OR NOT (v.end_time >= v.prev_start_time AND v.start_time <= v.prev_end_time);  -- 2nd condition

Result

id|task_id|start_time             |end_time               |
--+-------+-----------------------+-----------------------+
11|      1|2023-01-10 06:31:00.000|2023-01-10 08:53:00.000|
11|      2|2023-01-10 13:14:00.000|2023-01-10 15:15:00.000|
11|      2|2023-01-10 15:46:00.000|2023-01-10 17:59:00.000|
11|      2|2023-01-10 18:49:00.000|2023-01-10 18:50:00.000|
12|      3|2023-01-09 10:10:00.000|2023-01-09 11:10:00.000|
13|      4|2023-01-08 20:00:00.000|2023-01-09 03:44:00.000|
14|      5|2023-01-01 19:23:00.000|2023-01-01 20:47:00.000|
14|      5|2023-01-02 03:35:00.000|2023-01-02 06:57:00.000|

Try this https://dbfiddle.uk/id_waBN _ By using recursive query you make sure you cover any number of overlapping intervals. You start by the rows not intersected by previous one.

with task_duration_wrn(id, task_id, starttime, endtime, rn, act) as (
    select id, task_id, starttime, endtime, 
        rank() over(partition by id, task_id order by starttime, endtime) as rn,
        cast( 
        case when 
         starttime <= lag(endtime) over(partition by id, task_id order by starttime, endtime) 
        then 'PACK' end as VARCHAR(4))
    from task_duration
),
cte(id, task_id, starttime, endtime, rn, lvl, act) as (
    select d.id, d.task_id, d.starttime, d.endtime, d.rn, 1, 
    CAST(NULL AS VARCHAR(4))
    from task_duration_wrn d
    where act is NULL
    
    union all
    
    select d.id, d.task_id, c.starttime, d.endtime, d.rn, c.lvl+1, d.act
    from cte c
    join task_duration_wrn d on c.id = d.id and c.task_id = d.task_id and 
        c.lvl+1 = d.rn
    where d.act = 'PACK'    
)
select id, task_id, starttime, max(endtime) as endtime
from cte c
group by id, task_id, starttime
order by id, task_id, starttime

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM