[英]Postgres check for timestamp range overlap in table rows
We have a Postgres table (materialized view) containing around 2 million rows with columns like: 我们有一个Postgres表(物化视图),包含大约200万行,其列如下:
For each row in the table, we would like to add a result column that contains: 对于表中的每一行,我们要添加一个包含以下内容的结果列:
What would be an efficient approach to label each row as having overlap (1 or 0)? 将每一行标记为重叠(1或0)的有效方法是什么?
EDIT: 编辑:
The expected output would be something like: 预期的输出将是这样的:
I don't think there will be a really fast solution to that, as it does require comparing every row in the table with each and every other row in the table (or at least every other row in the specified range). 我不认为会有一个非常快速的解决方案,因为它需要将表中的每一行与表中的每一行(或至少指定范围内的每一行)进行比较。
Assuming your table's primary key column is named id
you could use Postgres' range function to check for overlapping rows: 假设您的表的主键列名为
id
您可以使用Postgres的范围函数来检查重叠的行:
with check_period (check_range) as (
values ( tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00') )
)
select id,
start_Time,
end_time,
exists (select *
from the_table t2
cross join check_perioud
where t2.id <> t1.id
and tstzrange(t1.start_time, t1.end_time) && tstzrange(t2.start_time, t2.start_time)
and tstzrange(t2.start_time, t2.start_time) <@ check_range
) has_overlapping_rows
from the_table t1
cross join check_period
where tstzrange(t1.start_time, t1.end_time) <@ check_range;
The CTE check_period
is only there, so that the values for time period you want to analyze are not repeated. CTE
check_period
仅在那里,因此不会重复您想要分析的时间段的值。 If you don't care about repeating them, you can remove it: 如果您不关心重复它们,可以将其删除:
select id,
start_Time,
end_time,
exists (select *
from the_table t2
where t2.id <> t1.id
and tstzrange(t1.start_time, t1.end_time) && tstzrange(t2.start_time, t2.start_time)
and tstzrange(t2.start_time, t2.start_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00')
) has_overlapping_rows
from the_table t1
where tstzrange(t1.start_time, t1.end_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00');
You should create an index on the timestamp range to make that quick: 您应该在时间戳范围上创建一个索引以使其快速:
create index on the_table( (tstzrange(start_time, end_time), id );
You can extend the above query to return a count of the overlapping rows rather than a true/false flag: 您可以扩展上面的查询以返回重叠行的计数而不是true / false标志:
select id,
start_Time,
end_time,
(select count(*)
from the_table t2
where t2.id <> t1.id
and tstzrange(t1.start_time, t1.end_time) && tstzrange(t2.start_time, t2.start_time)
and tstzrange(t2.start_time, t2.start_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00')
) has_overlapping_rows
from the_table t1
where tstzrange(t1.start_time, t1.end_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00');
However for rows having many overlapping rows, this will be slower, because the count(*)
forces the database to inspect all overlapping rows. 但是对于具有许多重叠行的行,这将更慢,因为
count(*)
强制数据库检查所有重叠的行。 The exists()
solution can stop at the first row found. exists()
解决方案可以在找到的第一行停止。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.