简体   繁体   English

如何按数据查询和排序数据,其中数据之间的时间间隔都超过5分钟

[英]How to Query and sort data by its count where the time interval between data are all longer than 5 mins

I want to query a list of data its data source like this: 我想查询数据列表,其数据源如下:

ID       EVENT      TIME
--------------------------
A       EVENT_1     2019-05-07 18:26:39.000
B       EVENT_1     2019-05-07 18:31:39.000 
C       EVENT_3     2019-05-07 18:32:39.000
A       EVENT_2     2019-05-07 18:32:39.000
A       EVENT_2     2019-05-07 18:33:39.000
A       EVENT_1     2019-05-07 18:34:39.000
B       EVENT_2     2019-05-07 18:35:39.000
B       EVENT_1     2019-05-07 18:36:39.000
C       EVENT_2     2019-05-07 18:38:39.000
A       EVENT_1     2019-05-07 18:40:39.000
--------------------------

first, choose only the earliest data when the data with the same ID trigger again in 5 minutes (regardless what its event is) 首先,当具有相同ID的数据在5分钟内再次触发时,仅选择最早的数据(无论其事件是什么)

so, the data should become like this: 所以,数据应该是这样的:

ID       EVENT      TIME
--------------------------
A       EVENT_1     2019-05-07 18:26:39.000
B       EVENT_1     2019-05-07 18:31:39.000 
C       EVENT_3     2019-05-07 18:32:39.000
A       EVENT_2     2019-05-07 18:32:39.000
C       EVENT_2     2019-05-07 18:38:39.000
A       EVENT_1     2019-05-07 18:40:39.000
--------------------------

Thanks, I am using SQL Server 2016 谢谢,我正在使用SQL Server 2016

Not as simple as it seems, a recursive CTE can incorporate the closest record by ID in a 1 by 1 manner. 并不像看起来那么简单,递归CTE可以通过ID以1比1的方式合并最接近的记录。

Set up: 设定:

IF OBJECT_ID('tempdb..#EventData') IS NOT NULL
    DROP TABLE #EventData

CREATE TABLE #EventData (
    RowID INT IDENTITY,
    ID CHAR,
    Event VARCHAR(100),
    Time DATETIME)

INSERT INTO #EventData (
    ID,
    Event,
    Time)
VALUES
    ('A',' EVENT_1','2019-05-07 18:26:39.000'), 
    ('B',' EVENT_1','2019-05-07 18:31:39.000 '), 
    ('C',' EVENT_3','2019-05-07 18:32:39.000'), 
    ('A',' EVENT_2','2019-05-07 18:32:39.000'), 
    ('A',' EVENT_2','2019-05-07 18:33:39.000'), 
    ('A',' EVENT_1','2019-05-07 18:34:39.000'), 
    ('B',' EVENT_2','2019-05-07 18:35:39.000'), 
    ('B',' EVENT_1','2019-05-07 18:36:39.000'), 
    ('C',' EVENT_2','2019-05-07 18:38:39.000'), 
    ('A',' EVENT_1','2019-05-07 18:40:39.000')

Solution: 解:

;WITH EarliestRecordByID AS
(
    SELECT
        E.ID,
        MinTime = MIN(E.Time)
    FROM
        #EventData AS E
    GROUP BY
        E.ID
),
EventDataWithClosestRecord AS
(
    SELECT
        E.RowID,
        E.ID,
        E.Event,
        E.Time,
        ClosestRowID = T.RowID
    FROM
        #EventData AS E
        OUTER APPLY (
            SELECT TOP 1
                C.RowID
            FROM
                #EventData AS C
            WHERE
                C.ID = E.ID AND
                C.Time > DATEADD(MINUTE, 5, E.Time)
            ORDER BY
                C.Time) AS T
),
RecursiveCTE AS
(
    SELECT
        E.ID,
        E.RowID,
        E.Event,
        E.Time,
        E.ClosestRowID
    FROM
        EventDataWithClosestRecord AS E
        INNER JOIN EarliestRecordByID AS M ON 
            E.ID = M.ID AND
            E.Time = M.MinTime

    UNION ALL

    SELECT
        R.ID,
        D.RowID,
        D.Event,
        D.Time,
        D.ClosestRowID
    FROM
        RecursiveCTE AS R
        INNER JOIN EventDataWithClosestRecord AS D ON R.ClosestRowID = D.RowID
)
SELECT
    R.ID,
    R.RowID,
    R.Event,
    R.Time
FROM
    RecursiveCTE AS R
ORDER BY
    R.Time
OPTION
    (MAXRECURSION 1000) -- Your max recursion level here (0 for unlimited)

Result: 结果:

ID  RowID   Event       Time
A   1       EVENT_1     2019-05-07 18:26:39.000
B   2       EVENT_1     2019-05-07 18:31:39.000
C   3       EVENT_3     2019-05-07 18:32:39.000
A   4       EVENT_2     2019-05-07 18:32:39.000
C   9       EVENT_2     2019-05-07 18:38:39.000
A   10      EVENT_1     2019-05-07 18:40:39.000

There might be another (probably faster) solution using window functions, as recursive solutions in SQL Server have very poor performance compared to non-recursive ones. 可能存在使用窗口函数的另一个(可能更快)解决方案,因为与非递归解决方案相比,SQL Server中的递归解决方案具有非常差的性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM