简体   繁体   English

Postgres检查表行中的时间戳范围重叠

[英]Postgres check for timestamp range overlap in table rows

We have a Postgres table (materialized view) containing around 2 million rows with columns like: 我们有一个Postgres表(物化视图),包含大约200万行,其列如下:

  • start_time (timestampz) - has index start_time(timestampz) - 有索引
  • end_time (timestampz) - has index end_time(timestampz) - 有索引

For each row in the table, we would like to add a result column that contains: 对于表中的每一行,我们要添加一个包含以下内容的结果列:

  • 1, if the row start and end time range overlaps with any other row 1,如果行开始和结束时间范围与任何其他行重叠
  • 0, if the row start and end time range does not overlap with any other row 0,如果行开始和结束时间范围不与任何其他行重叠

What would be an efficient approach to label each row as having overlap (1 or 0)? 将每一行标记为重叠(1或0)的有效方法是什么?

EDIT: 编辑:

The expected output would be something like: 预期的输出将是这样的:

  • row_id ROW_ID
  • has_overlap - boolean or int (1 or 0) has_overlap - 布尔值或int(1或0)

I don't think there will be a really fast solution to that, as it does require comparing every row in the table with each and every other row in the table (or at least every other row in the specified range). 我不认为会有一个非常快速的解决方案,因为它需要将表中的每一行与表中的每一行(或至少指定范围内的每一行)进行比较。

Assuming your table's primary key column is named id you could use Postgres' range function to check for overlapping rows: 假设您的表的主键列名为id您可以使用Postgres的范围函数来检查重叠的行:

with check_period (check_range) as (
   values ( tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00') )
)
select id, 
       start_Time, 
       end_time, 
       exists (select *
        from the_table t2
           cross join check_perioud
        where t2.id <> t1.id 
        and tstzrange(t1.start_time, t1.end_time) && tstzrange(t2.start_time, t2.start_time)
        and tstzrange(t2.start_time, t2.start_time) <@ check_range
       ) has_overlapping_rows
from the_table t1
  cross join check_period
where tstzrange(t1.start_time, t1.end_time) <@ check_range;

The CTE check_period is only there, so that the values for time period you want to analyze are not repeated. CTE check_period仅在那里,因此不会重复您想要分析的时间段的值。 If you don't care about repeating them, you can remove it: 如果您不关心重复它们,可以将其删除:

select id, 
       start_Time, 
       end_time, 
       exists (select *
        from the_table t2
        where t2.id <> t1.id 
        and tstzrange(t1.start_time, t1.end_time) && tstzrange(t2.start_time, t2.start_time)
        and tstzrange(t2.start_time, t2.start_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00')
       ) has_overlapping_rows
from the_table t1
where tstzrange(t1.start_time, t1.end_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00');

You should create an index on the timestamp range to make that quick: 您应该在时间戳范围上创建一个索引以使其快速:

create index on the_table( (tstzrange(start_time, end_time), id );

You can extend the above query to return a count of the overlapping rows rather than a true/false flag: 您可以扩展上面的查询以返回重叠行的计数而不是true / false标志:

select id, 
       start_Time, 
       end_time, 
       (select count(*)
        from the_table t2
        where t2.id <> t1.id 
        and tstzrange(t1.start_time, t1.end_time) && tstzrange(t2.start_time, t2.start_time)
        and tstzrange(t2.start_time, t2.start_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00')
       ) has_overlapping_rows
from the_table t1
where tstzrange(t1.start_time, t1.end_time) <@ tstzrange(timestamptz '2018-10-01 00:00:00', timestamptz '2018-10-14 20:15:00');

However for rows having many overlapping rows, this will be slower, because the count(*) forces the database to inspect all overlapping rows. 但是对于具有许多重叠行的行,这将更慢,因为count(*)强制数据库检查所有重叠的行。 The exists() solution can stop at the first row found. exists()解决方案可以在找到的第一行停止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM