[英]How to Query and sort data by its count where the time interval between data are all longer than 5 mins
I want to query a list of data its data source like this: 我想查询数据列表,其数据源如下:
ID EVENT TIME
--------------------------
A EVENT_1 2019-05-07 18:26:39.000
B EVENT_1 2019-05-07 18:31:39.000
C EVENT_3 2019-05-07 18:32:39.000
A EVENT_2 2019-05-07 18:32:39.000
A EVENT_2 2019-05-07 18:33:39.000
A EVENT_1 2019-05-07 18:34:39.000
B EVENT_2 2019-05-07 18:35:39.000
B EVENT_1 2019-05-07 18:36:39.000
C EVENT_2 2019-05-07 18:38:39.000
A EVENT_1 2019-05-07 18:40:39.000
--------------------------
first, choose only the earliest data when the data with the same ID trigger again in 5 minutes (regardless what its event is) 首先,当具有相同ID的数据在5分钟内再次触发时,仅选择最早的数据(无论其事件是什么)
so, the data should become like this: 所以,数据应该是这样的:
ID EVENT TIME
--------------------------
A EVENT_1 2019-05-07 18:26:39.000
B EVENT_1 2019-05-07 18:31:39.000
C EVENT_3 2019-05-07 18:32:39.000
A EVENT_2 2019-05-07 18:32:39.000
C EVENT_2 2019-05-07 18:38:39.000
A EVENT_1 2019-05-07 18:40:39.000
--------------------------
Thanks, I am using SQL Server 2016 谢谢,我正在使用SQL Server 2016
Not as simple as it seems, a recursive CTE can incorporate the closest record by ID in a 1 by 1 manner. 并不像看起来那么简单,递归CTE可以通过ID以1比1的方式合并最接近的记录。
Set up: 设定:
IF OBJECT_ID('tempdb..#EventData') IS NOT NULL
DROP TABLE #EventData
CREATE TABLE #EventData (
RowID INT IDENTITY,
ID CHAR,
Event VARCHAR(100),
Time DATETIME)
INSERT INTO #EventData (
ID,
Event,
Time)
VALUES
('A',' EVENT_1','2019-05-07 18:26:39.000'),
('B',' EVENT_1','2019-05-07 18:31:39.000 '),
('C',' EVENT_3','2019-05-07 18:32:39.000'),
('A',' EVENT_2','2019-05-07 18:32:39.000'),
('A',' EVENT_2','2019-05-07 18:33:39.000'),
('A',' EVENT_1','2019-05-07 18:34:39.000'),
('B',' EVENT_2','2019-05-07 18:35:39.000'),
('B',' EVENT_1','2019-05-07 18:36:39.000'),
('C',' EVENT_2','2019-05-07 18:38:39.000'),
('A',' EVENT_1','2019-05-07 18:40:39.000')
Solution: 解:
;WITH EarliestRecordByID AS
(
SELECT
E.ID,
MinTime = MIN(E.Time)
FROM
#EventData AS E
GROUP BY
E.ID
),
EventDataWithClosestRecord AS
(
SELECT
E.RowID,
E.ID,
E.Event,
E.Time,
ClosestRowID = T.RowID
FROM
#EventData AS E
OUTER APPLY (
SELECT TOP 1
C.RowID
FROM
#EventData AS C
WHERE
C.ID = E.ID AND
C.Time > DATEADD(MINUTE, 5, E.Time)
ORDER BY
C.Time) AS T
),
RecursiveCTE AS
(
SELECT
E.ID,
E.RowID,
E.Event,
E.Time,
E.ClosestRowID
FROM
EventDataWithClosestRecord AS E
INNER JOIN EarliestRecordByID AS M ON
E.ID = M.ID AND
E.Time = M.MinTime
UNION ALL
SELECT
R.ID,
D.RowID,
D.Event,
D.Time,
D.ClosestRowID
FROM
RecursiveCTE AS R
INNER JOIN EventDataWithClosestRecord AS D ON R.ClosestRowID = D.RowID
)
SELECT
R.ID,
R.RowID,
R.Event,
R.Time
FROM
RecursiveCTE AS R
ORDER BY
R.Time
OPTION
(MAXRECURSION 1000) -- Your max recursion level here (0 for unlimited)
Result: 结果:
ID RowID Event Time
A 1 EVENT_1 2019-05-07 18:26:39.000
B 2 EVENT_1 2019-05-07 18:31:39.000
C 3 EVENT_3 2019-05-07 18:32:39.000
A 4 EVENT_2 2019-05-07 18:32:39.000
C 9 EVENT_2 2019-05-07 18:38:39.000
A 10 EVENT_1 2019-05-07 18:40:39.000
There might be another (probably faster) solution using window functions, as recursive solutions in SQL Server have very poor performance compared to non-recursive ones. 可能存在使用窗口函数的另一个(可能更快)解决方案,因为与非递归解决方案相比,SQL Server中的递归解决方案具有非常差的性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.