[英]Finding overlapping timestamps in query
I have an issue with removing or tagging overlapping timestamps grouped by certain ID.我在删除或标记按特定 ID 分组的重叠时间戳时遇到问题。
Times can overlap in nest and may have same start time or end time.时间可以在嵌套中重叠,并且可以具有相同的开始时间或结束时间。
If second time starts before previous time has ended it will end before or at the same time as previous time.如果第二次在上一次结束之前开始,它将在上一次之前或与上一次同时结束。 No time differences will go over 12 hours.
go 不会有超过 12 小时的时差。
Using T-SQL.使用 T-SQL。
Sample data:样本数据:
ID task_id starttime endtime
11 1 2023-01-10 06:31:00.000 2023-01-10 08:53:00.000
11 1 2023-01-10 08:00:00.000 2023-01-10 08:53:00.000
11 2 2023-01-10 13:14:00.000 2023-01-10 15:15:00.000
11 2 2023-01-10 15:46:00.000 2023-01-10 17:59:00.000
11 2 2023-01-10 18:49:00.000 2023-01-10 18:50:00.000
12 3 2023-01-09 10:10:00.000 2023-01-09 11:10:00.000
12 3 2023-01-09 10:10:00.000 2023-01-09 10:50:00.000
13 4 2023-01-08 20:00:00.000 2023-01-09 03:44:00.000
13 4 2023-01-08 21:00:00.000 2023-01-09 02:00:00.000
14 5 2023-01-01 19:23:00.000 2023-01-01 20:47:00.000
14 5 2023-01-02 03:35:00.000 2023-01-02 06:57:00.000
Desired result:期望的结果:
ID task_id starttime endtime
11 1 2023-01-10 06:31:00.000 2023-01-10 08:53:00.000
11 2 2023-01-10 13:14:00.000 2023-01-10 15:15:00.000
11 2 2023-01-10 15:46:00.000 2023-01-10 17:59:00.000
11 2 2023-01-10 18:49:00.000 2023-01-10 18:50:00.000
12 3 2023-01-09 10:10:00.000 2023-01-09 11:10:00.000
13 4 2023-01-08 20:00:00.000 2023-01-09 03:44:00.000
14 5 2023-01-01 19:23:00.000 2023-01-01 20:47:00.000
14 5 2023-01-02 03:35:00.000 2023-01-02 06:57:00.000
I've tried methods with lead or lag functions but it doesn't seem to play well with edge cases.我尝试过具有领先或滞后功能的方法,但它似乎不能很好地处理边缘情况。 For example:
例如:
case when lead(starttime) over (partition by task_id order by starttime) <> endtime then 1 else 0 end as overlap_tag
Doesn't count the time in ID 11 task_id 2 from 18:49-18:50 as not overlapping and doesn't seem to take into account the day changing.不将 ID 11 task_id 2 中的时间从 18:49-18:50 计算为不重叠,并且似乎没有考虑到日期的变化。
I only tested it on PostgreSQL, but it might help.我只在 PostgreSQL 上测试过,但它可能会有所帮助。
Conditions情况
Preparation准备
CREATE TABLE task_duration (
id INTEGER,
task_id INTEGER,
start_time TIMESTAMP,
end_time TIMESTAMP
);
INSERT INTO task_duration VALUES (11, 1, '2023-01-10 06:31:00.000', '2023-01-10 08:53:00.000');
INSERT INTO task_duration VALUES (11, 1, '2023-01-10 08:00:00.000', '2023-01-10 08:53:00.000');
INSERT INTO task_duration VALUES (11, 2, '2023-01-10 13:14:00.000', '2023-01-10 15:15:00.000');
INSERT INTO task_duration VALUES (11, 2, '2023-01-10 15:46:00.000', '2023-01-10 17:59:00.000');
INSERT INTO task_duration VALUES (11, 2, '2023-01-10 18:49:00.000', '2023-01-10 18:50:00.000');
INSERT INTO task_duration VALUES (12, 3, '2023-01-09 10:10:00.000', '2023-01-09 11:10:00.000');
INSERT INTO task_duration VALUES (12, 3, '2023-01-09 10:10:00.000', '2023-01-09 10:50:00.000');
INSERT INTO task_duration VALUES (13, 4, '2023-01-08 20:00:00.000', '2023-01-09 03:44:00.000');
INSERT INTO task_duration VALUES (13, 4, '2023-01-08 21:00:00.000', '2023-01-09 02:00:00.000');
INSERT INTO task_duration VALUES (14, 5, '2023-01-01 19:23:00.000', '2023-01-01 20:47:00.000');
INSERT INTO task_duration VALUES (14, 5, '2023-01-02 03:35:00.000', '2023-01-02 06:57:00.000');
Query询问
SELECT id,
task_id,
start_time,
end_time
FROM (
SELECT id,
task_id,
start_time,
end_time,
LAG(start_time) OVER (PARTITION BY task_id ORDER BY task_id, start_time, end_time DESC) AS prev_start_time,
LAG(end_time) OVER (PARTITION BY task_id ORDER BY task_id, start_time, end_time DESC) AS prev_end_time
FROM task_duration
) v
WHERE prev_start_time IS NULL -- 1st condition
OR NOT (v.end_time >= v.prev_start_time AND v.start_time <= v.prev_end_time); -- 2nd condition
Result结果
id|task_id|start_time |end_time |
--+-------+-----------------------+-----------------------+
11| 1|2023-01-10 06:31:00.000|2023-01-10 08:53:00.000|
11| 2|2023-01-10 13:14:00.000|2023-01-10 15:15:00.000|
11| 2|2023-01-10 15:46:00.000|2023-01-10 17:59:00.000|
11| 2|2023-01-10 18:49:00.000|2023-01-10 18:50:00.000|
12| 3|2023-01-09 10:10:00.000|2023-01-09 11:10:00.000|
13| 4|2023-01-08 20:00:00.000|2023-01-09 03:44:00.000|
14| 5|2023-01-01 19:23:00.000|2023-01-01 20:47:00.000|
14| 5|2023-01-02 03:35:00.000|2023-01-02 06:57:00.000|
Try this https://dbfiddle.uk/id_waBN _ By using recursive query you make sure you cover any number of overlapping intervals.试试这个https://dbfiddle.uk/id_waBN _ 通过使用递归查询,您可以确保覆盖任意数量的重叠间隔。 You start by the rows not intersected by previous one.
您从与前一行不相交的行开始。
with task_duration_wrn(id, task_id, starttime, endtime, rn, act) as (
select id, task_id, starttime, endtime,
rank() over(partition by id, task_id order by starttime, endtime) as rn,
cast(
case when
starttime <= lag(endtime) over(partition by id, task_id order by starttime, endtime)
then 'PACK' end as VARCHAR(4))
from task_duration
),
cte(id, task_id, starttime, endtime, rn, lvl, act) as (
select d.id, d.task_id, d.starttime, d.endtime, d.rn, 1,
CAST(NULL AS VARCHAR(4))
from task_duration_wrn d
where act is NULL
union all
select d.id, d.task_id, c.starttime, d.endtime, d.rn, c.lvl+1, d.act
from cte c
join task_duration_wrn d on c.id = d.id and c.task_id = d.task_id and
c.lvl+1 = d.rn
where d.act = 'PACK'
)
select id, task_id, starttime, max(endtime) as endtime
from cte c
group by id, task_id, starttime
order by id, task_id, starttime
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.