简体   繁体   English

在 Rails / Postgresql 上的 Ruby 中查找多个连续日期(日期时间)

[英]Finding multiple consecutive dates (datetime) in Ruby on Rails / Postgresql

How can we find X consecutive dates (using by hour ) that meet a condition?我们如何找到满足条件的X个连续日期(按hour使用)?

EDIT : here is the SQL fiddle http://sqlfiddle.com/#!17/44928/1编辑:这里是 SQL 小提琴http://sqlfiddle.com/#!17/44928/1

Example:例子:

Find 3 consecutive dates where aa < 2 and bb < 6 and cc < 7查找where aa < 2bb < 6cc < 73个连续日期

Given this table called weather :鉴于这张名为weather的表:

timestamp时间戳 aa bb bb cc抄送
01/01/2000 00:00 01/01/2000 00:00 1 1 5 5 5 5
01/01/2000 01:00 01/01/2000 01:00 5 5 5 5 5 5
01/01/2000 02:00 01/01/2000 02:00 1 1 5 5 5 5
01/01/2000 03:00 01/01/2000 03:00 1 1 5 5 5 5
01/01/2000 04:00 01/01/2000 04:00 1 1 5 5 5 5
01/01/2000 05:00 01/01/2000 05:00 1 1 5 5 5 5

Answer should return the 3 records from 02:00, 03:00, 04:00 .答案应该从02:00, 03:00, 04:00返回 3 条记录。

How can we do this in Ruby on Rails - or directly in SQL if that is better?我们如何在 Ruby on Rails 中执行此操作 - 或者直接在 SQL 中执行此操作,如果这样更好?

I started working on a method based on this answer: Detect consecutive dates ranges using SQL我开始研究一种基于此答案的方法: 使用 SQL 检测连续日期范围

def consecutive_dates
  the_query = "WITH t AS (
    SELECT timestamp d,ROW_NUMBER() OVER(ORDER BY timestamp) i
    FROM @d
    GROUP BY timestamp
  )
  SELECT MIN(d),MAX(d)
  FROM t
  GROUP BY DATEDIFF(hour,i,d)"

  ActiveRecord::Base.connection.execute(the_query)
end

But I was unable to get it working.但我无法让它工作。

)This is a gaps-and-islands problem. )这是一个差距和孤岛问题。 Islands are adjacent records that match the condition, and you want islands that are at least 3 records long.孤岛是匹配条件的相邻记录,并且您需要至少 3 条记录长的孤岛。

Here is one approach that uses a window count that increments every time value that does not match the condition is met to define the groups.这是一种使用 window 计数的方法,该计数在每次满足与条件不匹配的值时递增以定义组。 We can then count how many rows there are in each group, and use that information to filter.然后我们可以计算每个组中有多少行,并使用该信息进行过滤。

select  *
from (
    select t.*, count(*) over(partition by a, grp) cnt
    from (
        select t.*,
            count(*) filter(where b <= 4) over(partition by a order by timestamp) grp
        from mytable t
    ) t
) t
where cnt >= 3

Assuming that you have one row every hour, then an easy way to get the first hour where this occurs uses lead() :假设您每小时有一行,那么使用lead()来获取发生这种情况的第一个小时的简单方法:

select t.*
from (select t.*,
             lead(timestamp, 2) over (order by timestamp) as timestamp_2
      from t
      where aa < 2 and bb < 6 and cc < 7
     ) t
where timetamp_2 = timestamp + interval '2 hour';

This filters on the conditions and looks at the rows two rows ahead.这会过滤条件并查看前面两行的行。 If it is two hours ahead, then three rows in a row match the conditions.如果提前两个小时,则连续三行符合条件。

Note: The above will return both 2020-01-01 02:00 and 2020-01-01 03:00, but you only seem to want the earliest.注意:以上将返回 2020-01-01 02:00 和 2020-01-01 03:00,但您似乎只想要最早的。 To handle that, use lag() as well:要处理这个问题,也可以使用lag()

select t.*
from (select t.*,
             lag(timestamp) over (order by timestamp) as prev_timestamp
             lead(timestamp, 2) over (order by timestamp) as timestamp_2
      from t
      where aa < 2 and bb < 6 and cc < 7
     ) t
where timetamp_2 = timestamp + interval '2 hour' and
      (prev_timestamp is null or prev_timestamp < timestamp - interval '1' hour);

You can generate the additional hours use generate_series() if you really need the original rows:如果您确实需要原始行,则可以使用generate_series()生成额外的小时数:

select t.timestamp + n.n * interval '1 hour', aa, bb, cc
from (select t.*,
             lead(timestamp, 2) over (order by timestamp) as timestamp_2
      from t
      where aa < 2 and bb < 6 and cc < 7
     ) t cross join lateral
     generate_series(0, 2) n
where timetamp_2 = timestamp + interval '2 hour';

Your data seems to have precise timestamps based on the question, so the timestamp equalities will work.您的数据似乎具有基于问题的精确时间戳,因此时间戳等式将起作用。 If the real data has more fuzziness, then the queries can be tweaked to take this into account.如果真实数据更加模糊,则可以调整查询以考虑到这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM