简体   繁体   English

PostgreSQL和顺序数据

[英]PostgreSQL and Sequential Data

I have a dataset that contains: 我有一个包含以下内容的数据集:

Table { date itemName }

The date for the most part is sequential. 大部分日期是顺序的。 There are no duplicates of the date [as it is the primary key]. 日期没有重复[因为它是主键]。

The question is split up into multiple parts (all with respect to using SQL): 问题分为多个部分(所有部分都与使用SQL有关):

  1. Is it possible to find gaps in the date series listed in the table? 是否有可能在表格中列出的日期系列中找到差距? For example: Dates 1/2/09-1/3/09 are missing 例如:缺少日期1/2/09-1/3/09
  2. Is it possible to find sections of dates that are missing from the table, that has a range greater than n (this is a number determined at run time)? 是否有可能找到表中缺少的日期部分,其范围大于n(这是在运行时确定的数字)? For example: For n = 2 Dates 1/2/09-1/3/09 are not returned but Dates 5/6/09-6/1/09 are. 例如:对于n = 2日期不会返回1/2/09-1/3/09 5/6/09-6/1/09但是日期为5/6/09-6/1/09

If you can use PostgreSQL 8.4 then window functions will help: 如果你可以使用PostgreSQL 8.4,那么窗口函数将有助于:

SELECT *
    FROM (SELECT itemName, date, date - lag(date) OVER w AS gap
              FROM someTable WINDOW w AS (ORDER BY date)
         ) AS pairs
    WHERE pairs.gap > '1 day'::interval;

Just create a function in plsql or in a client which will be checking all dates. 只需在plsql或客户端创建一个函数,它将检查所有日期。 Like this pseudocode: 像这个伪代码:

date checked_date = 2000-01-01;
int unchecked_section = 0;
while ( checked_date <= today() ) {
  if (! sql(select itemName from Table where itemName=checked_date)) {
    unchecked_section++;
  } else {
    if ( unchecked_section>=n ) {
      print checked_date-unchecked_section, checked_date
    }
    unchecked_section = 0;
  }
  checked_date++;
}
if ( unchecked_section ) {
  print checked_date-unchecked_section, checked_date
}

It does not have to be very fast as it is maintenance only. 它不一定非常快,因为它只是维护。 There aren't many dates to check - only 365 a year. 没有多少日期需要检查 - 一年只有365个。

After some testing I came up with the following SQL statement: 经过一些测试后,我想出了以下SQL语句:

SELECT date, itemName
  FROM "Table" as t1
  WHERE NOT EXISTS (
     SELECT date 
     FROM "Table" as t2 
     WHERE t2.date = (t1.date - INTERVAL '1 day')
  )
  ORDER BY date
  OFFSET 1  -- this will skip the first element

This will get you all rows that have no direct successor. 这将为您提供没有直接后继的所有行。

If you modify the statement to: 如果将语句修改为:

SELECT date, itemName
  FROM "Table" as t1
  WHERE NOT EXISTS (
    SELECT date 
    FROM "Table" as t2 
    WHERE (t2.date >= (t1.date - INTERVAL '2 day'))
    AND (t2.date < t1.date)
  )
  ORDER BY date
  OFFSET 1

you can use the INTERVAL length in the subselect's WHERE clause to filter by gaps of at least that size. 您可以使用subselect的WHERE子句中的INTERVAL长度来过滤至少该大小的间隙。

Hope that helps. 希望有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM