日期在N分钟内进行群组记录

Question

It's not as simple as creating intervals of time that are N minutes long. 这不像创建N分钟长的时间间隔那样简单。 One record might be 10:04, and the other 10:17 where N is 15. 一条记录可能是10:04，另一条记录是10:17，其中N是15。

Perhaps a user-function will work, maybe a CTE. 也许用户功能会起作用，也许是CTE。 It could require multiple joins on the same source table. 它可能需要在同一源表上进行多个联接。

I'm looking for the most "elegant" solution. 我正在寻找最“优雅”的解决方案。 Maybe there's a feature in SQL I didn't know about which makes this easy. 也许SQL中有一个我不知道的功能使这变得容易。

Here is a reference scenario to make answers more consistent with each other: 这是使答案彼此更加一致的参考方案：

create table Comparisons (
  DateField DateTime NOT NULL,
  Amount int not null, -- default to 5
)

insert into Comparisons (DateField) values ('2000-01-01 10:04'),('2000-01-01 10:17'),
('2000-01-01 12:01'),('2000-01-01 11:54'),('2000-01-01 03:02'),('2000-01-01 03:05'),
('2000-01-01 05:02'),('2000-01-01 05:05'),('2000-01-01 05:19')

output expected: 预期输出：

min: .. 10:04, max: .. 10:17, sum: 10 最少：.. 10:04，最多：.. 10:17，总和：10
min: .. 11:54, max: .. 12:01, sum: 10 最少：.. 11:54，最多：.. 12:01，总和：10
min: .. 03:02, max: .. 03:05, sum: 10 最少：.. 03:02，最多：.. 03:05，总和：10
min: .. 05:02, max: .. 05:19, sum: 15 [optional] 最小：.. 05:02，最大：.. 05:19，总和：15 [可选]

The last output is optional, but if an elegant solution has that as a side-effect, it's acceptable. 最后一个输出是可选的，但是如果一个优雅的解决方案将其作为副作用，则可以接受。 If an elegant solution can't achieve that optional last output, it won't be a deal breaker. 如果一个优雅的解决方案无法实现该可选的最后输出，那么它就不会破坏交易。

Answer 1

It looks like you want to group records based on gaps between them of at least <N> minutes. 看来您想根据记录之间的间隔至少<N>分钟对记录进行分组。

In SQL Server 2012+, you would use lag() to identify when groups start and cumulative sum to identify the groups: 在SQL Server 2012+中，您将使用lag()来标识组何时开始，并使用累积和来标识组：

select min(datefield), max(datefield), count(*) as num, sum(amount)
from (select c.*,
             sum(case when prev_datefield < dateadd(minute, -N, datefield)
                      then 1 else 0
                 end) over (order by datefield) as grp
      from (select c.*,
                   lag(datefield) over (order by datefield) as prev_datefield
            from Comparisons c
           ) c
      ) c
group by grp;

In earlier versions you can use correlated subqueries or apply for the same functionality (albeit at much worse performance). 在早期版本中，您可以使用相关的子查询或apply相同的功能（尽管性能要差得多）。

Answer 2

I believe this produces the results you want: 我相信这会产生您想要的结果：

DECLARE @Comparisons TABLE (i DATETIME, amt INT NOT NULL DEFAULT(5));
INSERT @Comparisons (i) VALUES ('2016-01-01 10:04:00.000')
, ('2016-01-01 10:17:00.000')
, ('2016-01-01 10:25:00.000')
, ('2016-01-01 10:37:00.000')
, ('2016-01-01 10:44:00.000')
, ('2016-01-01 11:52:00.000')
, ('2016-01-01 11:59:00.000')
, ('2016-01-01 12:10:00.000')
, ('2016-01-01 12:22:00.000')
, ('2016-01-01 13:00:00.000')
, ('2016-01-01 09:00:00.000');

DECLARE @N INT = 15;

WITH T AS (
    SELECT i
         , amt
         , CASE WHEN DATEDIFF(MINUTE, previ, i) <= @N THEN 0 ELSE 1 END RN1
         , CASE WHEN DATEDIFF(MINUTE, i, nexti) > @N THEN 1 ELSE 0 END RN2
    FROM @Comparisons t
    OUTER APPLY (SELECT MAX(i) FROM @Comparisons WHERE i < t.i)x(previ)
    OUTER APPLY (SELECT MIN(i) FROM @Comparisons WHERE i > t.i)y(nexti)
    )
, T2 AS (
    SELECT CASE RN1 WHEN 1 THEN i ELSE (SELECT MAX(i) FROM T WHERE RN1 = 1 AND i < T1.i) END mintime
         , CASE WHEN RN2 = 1 THEN i ELSE ISNULL((SELECT MIN(i) FROM T WHERE RN2 = 1 AND i > T1.i), i) END maxtime
         , amt
    FROM T T1
    )
SELECT mintime, maxtime, sum(amt) total
FROM T2
GROUP BY mintime, maxtime
ORDER BY mintime;

It's probably a little clunkier than it could be, but it's basically just grouping anything within an @N-minute chain. 它可能比以前要笨拙一些，但基本上只是将所有内容组合在一个@N分钟的链中。

Answer 3

Intervals could be used, if adjacent intervals are checked. 如果检查相邻间隔，则可以使用间隔。 This would require multiplying the source table records by 3 这将需要将源表记录乘以3

Pseudo-code: 伪代码：

select *
from Comparisons C, {-1, 0, 1} M
group by (datediff(mi, C.DateField, 0) / N) + M

The problem with this approach is how to eliminate the extra results. 这种方法的问题是如何消除额外的结果。 I suspect this is a deadend approach but someone else might see value in it. 我怀疑这是一种死路，但是其他人可能会从中发现价值。

Update: This approach would not work with the 4th expected output [min: .. 05:02, max: .. 05:19, sum: 15] 更新：此方法不适用于第4个预期输出[最小：.. 05:02，最大：.. 05:19，总和：15]

日期在N分钟内进行群组记录

问题描述

3 个解决方案

解决方案1
2 2016-07-06 03:32:07

解决方案2
2 已采纳 2016-07-06 05:04:03

解决方案3
0 2016-07-06 03:34:40

日期在N分钟内进行群组记录

问题描述

3 个解决方案

解决方案1 2 2016-07-06 03:32:07

解决方案2 2 已采纳 2016-07-06 05:04:03

解决方案3 0 2016-07-06 03:34:40

解决方案1
2 2016-07-06 03:32:07

解决方案2
2 已采纳 2016-07-06 05:04:03

解决方案3
0 2016-07-06 03:34:40