[英]PostgreSQL and Sequential Data
I have a dataset that contains: 我有一个包含以下内容的数据集:
Table { date itemName }
The date for the most part is sequential. 大部分日期是顺序的。 There are no duplicates of the date [as it is the primary key].
日期没有重复[因为它是主键]。
The question is split up into multiple parts (all with respect to using SQL): 问题分为多个部分(所有部分都与使用SQL有关):
1/2/09-1/3/09
are missing 1/2/09-1/3/09
n = 2
Dates 1/2/09-1/3/09
are not returned but Dates 5/6/09-6/1/09
are. n = 2
日期不会返回1/2/09-1/3/09
5/6/09-6/1/09
但是日期为5/6/09-6/1/09
。 If you can use PostgreSQL 8.4 then window functions will help: 如果你可以使用PostgreSQL 8.4,那么窗口函数将有助于:
SELECT *
FROM (SELECT itemName, date, date - lag(date) OVER w AS gap
FROM someTable WINDOW w AS (ORDER BY date)
) AS pairs
WHERE pairs.gap > '1 day'::interval;
Just create a function in plsql or in a client which will be checking all dates. 只需在plsql或客户端创建一个函数,它将检查所有日期。 Like this pseudocode:
像这个伪代码:
date checked_date = 2000-01-01;
int unchecked_section = 0;
while ( checked_date <= today() ) {
if (! sql(select itemName from Table where itemName=checked_date)) {
unchecked_section++;
} else {
if ( unchecked_section>=n ) {
print checked_date-unchecked_section, checked_date
}
unchecked_section = 0;
}
checked_date++;
}
if ( unchecked_section ) {
print checked_date-unchecked_section, checked_date
}
It does not have to be very fast as it is maintenance only. 它不一定非常快,因为它只是维护。 There aren't many dates to check - only 365 a year.
没有多少日期需要检查 - 一年只有365个。
After some testing I came up with the following SQL statement: 经过一些测试后,我想出了以下SQL语句:
SELECT date, itemName
FROM "Table" as t1
WHERE NOT EXISTS (
SELECT date
FROM "Table" as t2
WHERE t2.date = (t1.date - INTERVAL '1 day')
)
ORDER BY date
OFFSET 1 -- this will skip the first element
This will get you all rows that have no direct successor. 这将为您提供没有直接后继的所有行。
If you modify the statement to: 如果将语句修改为:
SELECT date, itemName
FROM "Table" as t1
WHERE NOT EXISTS (
SELECT date
FROM "Table" as t2
WHERE (t2.date >= (t1.date - INTERVAL '2 day'))
AND (t2.date < t1.date)
)
ORDER BY date
OFFSET 1
you can use the INTERVAL length in the subselect's WHERE clause to filter by gaps of at least that size. 您可以使用subselect的WHERE子句中的INTERVAL长度来过滤至少该大小的间隙。
Hope that helps. 希望有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.