简体   繁体   English

将表分组为15分钟间隔

[英]Group table into 15 minute intervals

T-SQL, SQL Server 2008 and up T-SQL,SQL Server 2008及更高版本

Given a sample table of 给定一个样本表

 StatusSetDateTime   | UserID | Status    | StatusEndDateTime   | StatusDuration(in seconds)
============================================================================
 2012-01-01 12:00:00 | myID   | Available | 2012-01-01 13:00:00 | 3600

I need to break that down into a view that uses 15 minute intervals for example: 我需要将其分解为使用15分钟间隔的视图,例如:

IntervalStart       | UserID | Status | Duration

===========================================

2012-01-01 12:00:00 | myID | Available | 900 

2012-01-01 12:15:00 | myID | Available | 900

2012-01-01 12:30:00 | myID | Available | 900 

2012-01-01 12:45:00 | myID | Available | 900 

2012-01-01 13:00:00 | myID | Available | 0

etc....

Now I've been able to search around and find some queries that will break down I found something similar for MySql Here : 现在,我已经可以搜索并找到一些将分解的查询,我在这里为MySql找到了类似的东西:

And something for T-SQL Here 而东西T-SQL 这里

But on the second example they are summing the results whereas I need to divide the total duration by the interval time (900 seconds) by user by status. 但是在第二个示例中,它们是对结果求和,而我需要将总持续时间除以间隔时间(900秒),再将用户除以状态。

I was able to adapt the examples in the second link to split everything into intervals but the total duration time is returned and I cannot quite figure out how to get the Interval durations to split (and still sum up to the total original duration). 我能够修改第二个链接中的示例,以将所有内容均分成间隔,但是返回了总持续时间,而且我还不太清楚如何获得要分割的间隔持续时间(并且仍然是原始总持续时间的总和)。

Thanks in advance for any insight! 预先感谢您的任何见解!

edit : First Attempt 编辑:第一次尝试

 ;with cte as 
    (select MIN(StatusDateTime) as MinDate
          , MAX(StatusDateTime) as MaxDate
          , convert(varchar(14),StatusDateTime, 120) as StartDate
          , DATEPART(minute, StatusDateTime) /15 as GroupID
          , UserID
          , StatusKey
          , avg(StateDuration) as AvgAmount
     from AgentActivityLog
     group by convert(varchar(14),StatusDateTime, 120)
         , DATEPART(minute, StatusDateTime) /15
         , Userid,StatusKey)

  select dateadd(minute, 15*GroupID, CONVERT(datetime,StartDate+'00'))
         as [Start Date]
       , UserID, StatusKey, AvgAmount as [Average Amount]
  from cte

edit : Second Attempt 编辑:第二次尝试

;With cte As
   (Select DateAdd(minute
                   , 15 * (DateDiff(minute, '20000101', StatusDateTime) / 15)
                   , '20000101') As StatusDateTime
         , userid, statuskey, StateDuration
    From AgentActivityLog)

 Select StatusDateTime, userid,statuskey,Avg(StateDuration)
 From cte
 Group By StatusDateTime,userid,statuskey;
;with cte_max as 
(
   select dateadd(mi, -15, max(StatusEndDateTime)) as EndTime, min(StatusSetDateTime) as StartTime
   from AgentActivityLog
), times as
(
    select StartTime as Time from cte_max
    union all
    select dateadd(mi, 15, c.Time)
    from times as c
        cross join cte_max as cm
    where c.Time <= cm.EndTime
)
select
    t.Time, A.UserID, A.Status,
    case
        when t.Time = A.StatusEndDateTime then 0
        else A.StatusDuration / (count(*) over (partition by A.StatusSetDateTime, A.UserID, A.Status) - 1)
    end as Duration
from AgentActivityLog as A
    left outer join times as t on t.Time >= A.StatusSetDateTime and t.Time <= A.StatusEndDateTime

sql fiddle demo sql小提琴演示

I've never been comfortable with using date math to split things up into partitions. 我从来不习惯使用日期数学将事情分解为多个分区。 It seems like there are all kinds of pitfalls to fall into. 似乎存在各种陷阱。

What I prefer to do is to create a table (pre-defined, table-valued function, table variable) where there's one row for each date partition range. 我更喜欢做的是创建一个表(预定义的,表值函数,表变量),其中每个日期分区范围都有一行。 The table-valued function approach is particularly useful because you can build it for arbitrary ranges and partition sizes as you need. 表值函数方法特别有用,因为您可以根据需要为任意范围和分区大小构建它。 Then, you can join to this table to split things out. 然后,您可以连接到该表以将内容分开。

paritionid starttime     endtime
---------- ------------- -------------
1          8/1/2012 5:00 8/1/2012 5:15
2          8/1/2012 5:15 8/1/2012 5:30
...

I can't speak to the performance of this method, but I find the queries are much more intuitive. 我不能说这种方法的性能,但是我发现查询要直观得多。

It is relatively simple if you have a helper table with every 15-minute timestamp, which you JOIN to your base table via BETWEEN. 如果您有一个带有每15分钟时间戳记的辅助表,则相对简单,您可以通过BETWEEN将其加入到基本表中。 You can build the helper table on the fly or keep it permanently in your database. 您可以动态构建帮助程序表,也可以将其永久保存在数据库中。 Simple for the next guy at your company to figure out too: 也很容易让贵公司的下一个人也知道:

// declare a table and a timestamp variable
declare @timetbl table(t datetime)
declare @t datetime

// set the first timestamp
set @t = '2012-01-01 00:00:00'

// set the last timestamp, can easily be extended to cover many years
while @t <= '2013-01-01'
begin
    // populate the table with a new row, every 15 minutes
    insert into @timetbl values (@t)
    set @t = dateadd(mi, 15, @t)
end


// now the Select query:
select 
   tt.t, aal.UserID, aal.Status,
   case when aal.StatusEndDateTime <= tt.t then 0 else 900 end as Duration
   // using a shortcut for Duration, based on your comment that Start/End are always on the quarter-hour, and thus always 900 seconds or zero

from 
   @timetbl tt 
      INNER JOIN AgentActivityLog aal 
         on tt.t between aal.StatusSetDateTime and aal.StatusEndDateTime

order by
  aal.UserID, tt.t

You can use a recursive Common Table Expression , where you keep adding your duration while the StatusEndDateTime is greater than the IntervalStart eg 您可以使用递归公用表表达式 ,在其中,当StatusEndDateTime大于IntervalStart时,您可以继续添加持续时间,例如

;with cte as (
    select StatusSetDateTime as IntervalStart
        ,UserID
        ,Status
        ,StatusDuration/(datediff(mi, StatusSetDateTime, StatusEndDateTime)/15) as Duration
        , StatusEndDateTime
    From AgentActivityLog
    Union all
    Select DATEADD(ss, Duration, IntervalStart) as IntervalStart
        , UserID
        , Status
        , case when DATEADD(ss, Duration, IntervalStart) = StatusEndDateTime then 0 else Duration end as Duration
        , StatusEndDateTime
    From cte
    Where IntervalStart < StatusEndDateTime
)

select IntervalStart, UserID, Status, Duration from cte

Here's a query that will do the job for you without requiring helper tables. 这是一个查询,可以帮您完成这项工作,而无需帮助程序表。 (I have nothing against helper tables, they are useful and I use them. It is also possible to not use them sometimes.) This query allows for activities to start and end at any times, even if not whole minutes ending in :00, :15, :30, :45. (我对helper表没有任何帮助,它们非常有用,而且我会使用它们。有时也可以不使用它们。)此查询允许活动在任何时间开始和结束,即使不是以:00结尾的整分钟, :15,:30,:45。 If there will be millisecond portions then you'll have to do some experimenting because, following your model, I only went to second resolution. 如果会有毫秒部分,那么您将不得不做一些试验,因为按照您的模型,我只使用了第二种分辨率。

If you have a known hard maximum duration, then remove @MaxDuration and replace it with that value, in minutes. 如果您知道硬的最大持续时间,请删除@MaxDuration并将其替换为该值(以分钟为单位)。 N <= @MaxDuration is crucial to the query performing well. N <= @MaxDuration对查询的良好性能至关重要。

DECLARE @MaxDuration int;
SET @MaxDuration = (SELECT Max(StatusDuration) / 60 FROM #AgentActivityLog);

WITH
L0 AS(SELECT 1 c UNION ALL SELECT 1),
L1 AS(SELECT 1 c FROM L0, L0 B),
L2 AS(SELECT 1 c FROM L1, L1 B),
L3 AS(SELECT 1 c FROM L2, L2 B),
L4 AS(SELECT 1 c FROM L3, L3 B),
L5 AS(SELECT 1 c FROM L4, L4 B),
Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) n FROM L5)
SELECT
   S.IntervalStart,
   Duration = DateDiff(second, S.IntervalStart, E.IntervalEnd)
FROM
   #AgentActivityLog L
   CROSS APPLY (
      SELECT N, Offset = (N.N - 1) * 900
      FROM Nums N
      WHERE N <= @MaxDuration
   ) N
   CROSS APPLY (
      SELECT Edge =
         DateAdd(second, N.Offset, DateAdd(minute,
            DateDiff(minute, '20000101', L.StatusSetDateTime)
            / 15 * 15, '20000101')
         )
   ) G
   CROSS APPLY (
      SELECT IntervalStart = Max(T.BeginTime)
      FROM (
         SELECT L.StatusSetDateTime
         UNION ALL SELECT G.Edge
      ) T (BeginTime)
   ) S
   CROSS APPLY (
      SELECT IntervalEnd = Min(T.EndTime)
      FROM (
         SELECT L.StatusEndDateTime
         UNION ALL SELECT G.Edge + '00:15:00'
      ) T (EndTime)
   ) E
WHERE
   N.Offset <= L.StatusDuration
ORDER BY
   L.StatusSetDateTime,
   S.IntervalStart;

Here is setup script if you want to try it: 如果您想尝试一下,这里是安装脚本:

CREATE TABLE #AgentActivityLog (
    StatusSetDateTime datetime,
    StatusEndDateTime datetime,
    StatusDuration AS (DateDiff(second, 0, StatusEndDateTime - StatusSetDateTime))
);

INSERT #AgentActivityLog -- weird end times
SELECT '20120101 12:00:00', '20120101 13:00:00'
UNION ALL SELECT '20120101 13:00:00', '20120101 13:27:56'
UNION ALL SELECT '20120101 13:27:56', '20120101 13:28:52'
UNION ALL SELECT '20120101 13:28:52', '20120120 11:00:00'

INSERT #AgentActivityLog -- 15-minute quantized end times
SELECT '20120101 12:00:00', '20120101 13:00:00'
UNION ALL SELECT '20120101 13:00:00', '20120101 13:30:00'
UNION ALL SELECT '20120101 13:30:00', '20120101 14:00:00'
UNION ALL SELECT '20120101 14:00:00', '20120120 11:00:00'

Also, here's a version that expects ONLY times that have whole minutes ending in :00, :15, :30, or :45. 另外,这是一个版本,仅预期以分钟为单位的完整时间以:00,:15,:30或:45结尾。

DECLARE @MaxDuration int;
SET @MaxDuration = (SELECT Max(StatusDuration) / 60 FROM #AgentActivityLog);

WITH
L0 AS(SELECT 1 c UNION ALL SELECT 1),
L1 AS(SELECT 1 c FROM L0, L0 B),
L2 AS(SELECT 1 c FROM L1, L1 B),
L3 AS(SELECT 1 c FROM L2, L2 B),
L4 AS(SELECT 1 c FROM L3, L3 B),
L5 AS(SELECT 1 c FROM L4, L4 B),
Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) n FROM L5)
SELECT
   S.IntervalStart,
   Duration = CASE WHEN Offset = StatusDuration THEN 0 ELSE 900 END
FROM
   #AgentActivityLog L
   CROSS APPLY (
      SELECT N, Offset = (N.N - 1) * 900
      FROM Nums N
      WHERE N <= @MaxDuration
   ) N
   CROSS APPLY (
      SELECT IntervalStart = DateAdd(second, N.Offset, L.StatusSetDateTime)
   ) S
WHERE
   N.Offset <= L.StatusDuration   
ORDER BY
   L.StatusSetDateTime,
   S.IntervalStart;

It really seems like having the final 0 Duration row is not correct, because then you can't just order by IntervalStart as there are duplicate IntervalStart values. 看起来最后的0 Duration行确实不正确,因为那样的话您就不能仅按IntervalStart进行排序,因为有重复的IntervalStart值。 What is the benefit of having rows that add 0 to the total? 使行的总数加0有什么好处?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM