简体   繁体   English

日期在N分钟内进行群组记录

[英]Group records when date is within N minutes

It's not as simple as creating intervals of time that are N minutes long. 这不像创建N分钟长的时间间隔那样简单。 One record might be 10:04, and the other 10:17 where N is 15. 一条记录可能是10:04,另一条记录是10:17,其中N是15。

Perhaps a user-function will work, maybe a CTE. 也许用户功能会起作用,也许是CTE。 It could require multiple joins on the same source table. 它可能需要在同一源表上进行多个联接。

I'm looking for the most "elegant" solution. 我正在寻找最“优雅”的解决方案。 Maybe there's a feature in SQL I didn't know about which makes this easy. 也许SQL中有一个我不知道的功能使这变得容易。

Here is a reference scenario to make answers more consistent with each other: 这是使答案彼此更加一致的参考方案:

create table Comparisons (
  DateField DateTime NOT NULL,
  Amount int not null, -- default to 5
)

insert into Comparisons (DateField) values ('2000-01-01 10:04'),('2000-01-01 10:17'),
('2000-01-01 12:01'),('2000-01-01 11:54'),('2000-01-01 03:02'),('2000-01-01 03:05'),
('2000-01-01 05:02'),('2000-01-01 05:05'),('2000-01-01 05:19')

output expected: 预期输出:

  • min: .. 10:04, max: .. 10:17, sum: 10 最少:.. 10:04,最多:.. 10:17,总和:10
  • min: .. 11:54, max: .. 12:01, sum: 10 最少:.. 11:54,最多:.. 12:01,总和:10
  • min: .. 03:02, max: .. 03:05, sum: 10 最少:.. 03:02,最多:.. 03:05,总和:10
  • min: .. 05:02, max: .. 05:19, sum: 15 [optional] 最小:.. 05:02,最大:.. 05:19,总和:15 [可选]

The last output is optional, but if an elegant solution has that as a side-effect, it's acceptable. 最后一个输出是可选的,但是如果一个优雅的解决方案将其作为副作用,则可以接受。 If an elegant solution can't achieve that optional last output, it won't be a deal breaker. 如果一个优雅的解决方案无法实现该可选的最后输出,那么它就不会破坏交易。

It looks like you want to group records based on gaps between them of at least <N> minutes. 看来您想根据记录之间的间隔至少<N>分钟对记录进行分组。

In SQL Server 2012+, you would use lag() to identify when groups start and cumulative sum to identify the groups: 在SQL Server 2012+中,您将使用lag()来标识组何时开始,并使用累积和来标识组:

select min(datefield), max(datefield), count(*) as num, sum(amount)
from (select c.*,
             sum(case when prev_datefield < dateadd(minute, -N, datefield)
                      then 1 else 0
                 end) over (order by datefield) as grp
      from (select c.*,
                   lag(datefield) over (order by datefield) as prev_datefield
            from Comparisons c
           ) c
      ) c
group by grp;

In earlier versions you can use correlated subqueries or apply for the same functionality (albeit at much worse performance). 在早期版本中,您可以使用相关的子查询或apply相同的功能(尽管性能要差得多)。

I believe this produces the results you want: 我相信这会产生您想要的结果:

DECLARE @Comparisons TABLE (i DATETIME, amt INT NOT NULL DEFAULT(5));
INSERT @Comparisons (i) VALUES ('2016-01-01 10:04:00.000')
, ('2016-01-01 10:17:00.000')
, ('2016-01-01 10:25:00.000')
, ('2016-01-01 10:37:00.000')
, ('2016-01-01 10:44:00.000')
, ('2016-01-01 11:52:00.000')
, ('2016-01-01 11:59:00.000')
, ('2016-01-01 12:10:00.000')
, ('2016-01-01 12:22:00.000')
, ('2016-01-01 13:00:00.000')
, ('2016-01-01 09:00:00.000');

DECLARE @N INT = 15;

WITH T AS (
    SELECT i
         , amt
         , CASE WHEN DATEDIFF(MINUTE, previ, i) <= @N THEN 0 ELSE 1 END RN1
         , CASE WHEN DATEDIFF(MINUTE, i, nexti) > @N THEN 1 ELSE 0 END RN2
    FROM @Comparisons t
    OUTER APPLY (SELECT MAX(i) FROM @Comparisons WHERE i < t.i)x(previ)
    OUTER APPLY (SELECT MIN(i) FROM @Comparisons WHERE i > t.i)y(nexti)
    )
, T2 AS (
    SELECT CASE RN1 WHEN 1 THEN i ELSE (SELECT MAX(i) FROM T WHERE RN1 = 1 AND i < T1.i) END mintime
         , CASE WHEN RN2 = 1 THEN i ELSE ISNULL((SELECT MIN(i) FROM T WHERE RN2 = 1 AND i > T1.i), i) END maxtime
         , amt
    FROM T T1
    )
SELECT mintime, maxtime, sum(amt) total
FROM T2
GROUP BY mintime, maxtime
ORDER BY mintime;

It's probably a little clunkier than it could be, but it's basically just grouping anything within an @N-minute chain. 它可能比以前要笨拙一些,但基本上只是将所有内容组合在一个@N分钟的链中。

Intervals could be used, if adjacent intervals are checked. 如果检查相邻间隔,则可以使用间隔。 This would require multiplying the source table records by 3 这将需要将源表记录乘以3

Pseudo-code: 伪代码:

select *
from Comparisons C, {-1, 0, 1} M
group by (datediff(mi, C.DateField, 0) / N) + M

The problem with this approach is how to eliminate the extra results. 这种方法的问题是如何消除额外的结果。 I suspect this is a deadend approach but someone else might see value in it. 我怀疑这是一种死路,但是其他人可能会从中发现价值。

Update: This approach would not work with the 4th expected output [min: .. 05:02, max: .. 05:19, sum: 15] 更新:此方法不适用于第4个预期输出[最小:.. 05:02,最大:.. 05:19,总和:15]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM