简体   繁体   English

计算记录每天在特定状态下花费的时间

[英]Calculation of time a record spent in particular status per day

Could anybody please help me with the next task? 有人可以帮我完成下一个任务吗?

Here is a problem: we have a table of history (status changes of processes) and we need to calculate how much time (in hours per day) a process was in a particular status. 这是一个问题:我们有一个历史记录表(流程的状态变化),我们需要计算一个流程处于特定状态的时间(每天的小时数)。 Here is the structure of history table: 这是历史记录表的结构:

ID| ProcessId| CreatedDate         | Status 
------------------------------------------- 
1 | Process1 | 2016-01-09 06:30:00 | UP
2 | Process1 | 2016-01-09 12:30:00 | UP
3 | Process1 | 2016-01-09 18:30:00 | DOWN
4 | Process1 | 2016-01-10 00:30:00 | UP 
5 | Process2 | 2016-01-08 18:30:00 | UP
6 | Process2 | 2016-01-09 00:30:00 | DOWN
7 | Process2 | 2016-01-09 06:30:00 | DOWN
8 | Process2 | 2016-01-09 12:30:00 | DOWN
9 | Process2 | 2016-01-09 18:30:00 | DOWN
10| Process2 | 2016-01-10 00:30:00 | UP
11| Process2 | 2016-01-10 06:30:00 | UP
12| Process2 | 2016-01-10 12:30:00 | DOWN
13| Process2 | 2016-01-10 18:30:00 | DOWN
14| Process2 | 2016-01-11 00:30:00 | DOWN
15| Process2 | 2016-01-11 06:30:00 | DOWN

As a result we need to create a view / table like: 因此,我们需要创建一个视图/表,如下所示:

ProcessId | Status | Date        | TimeSpentInStatusInDays
----------------------------------------------------------
Process1  | UP     | 2016-01-09  | 12h 00m
Process1  | DOWN   | 2016-01-09  | 05h 30m
Process1  | UP     | 2016-01-10  | 00h 00m
Process1  | DOWN   | 2016-01-10  | 00h 30m
Process2  | UP     | 2016-01-08  | 05h 30m
Process2  | DOWN   | 2016-01-08  | 00h 00m
Process2  | UP     | 2016-01-09  | 24h 00m
Process2  | DOWN   | 2016-01-09  | 00h 00m
Process2  | UP     | 2016-01-10  | 12h 00m
Process2  | DOWN   | 2016-01-10  | 12h 00m
Process2  | UP     | 2016-01-11  | 00h 00m
Process2  | DOWN   | 2016-01-11  | 06h 30m

Values are for example (they are not connected to actual data set). 例如,值(它们未连接到实际数据集)。

The code needs to be in mySQL. 该代码必须在mySQL中。 Any help would be much appreciated.Thanks. 任何帮助将不胜感激。谢谢。

I'm not promising this is a good way to do this in MySQL or that it's fast. 我不保证这是在MySQL中执行此操作的好方法,或者它很快。

I take your history table and append rows where necessary for the end of each day (except the last day per process.) The added rows contain the status of the final row per process per day. 我将使用您的历史记录表,并在每天结束时(每个流程的最后一天除外)在必要时追加行。添加的行包含每天每个流程的最后一行的状态。 This could indeed result in an instantaneous status change at midnight if such a row already existed. 如果已经存在这样的行,这的确可能导致午夜的瞬时状态更改。 (I tried to handle this scenario later.) (我稍后尝试处理这种情况。)

Since MySQL doesn't have lead/lag functions I'm matching up each row of two identical copies of the above to find the next time in sequence (which may be the logical status row added for end of day.) After that it's just a matter of grouping. 由于MySQL没有前导/滞后功能,因此我要对上述两个相同副本的每一行进行匹配以按顺序查找下一次(这可能是为一天结束而添加的逻辑状态行)。分组问题。

Since I'm not as familiar with MySQL date functions I just went with time_to_sec since the span can never be more the a day. 由于我不太熟悉MySQL日期函数,因此我只使用time_to_sec因为跨度永远不会超过一天。 The only complication is that midnight has to be treated specially. 唯一的并发症是午夜必须特别对待。 I'll let you deal with converting the seconds value to an appropriate output format. 我将让您处理将秒值转换为适当的输出格式。

http://sqlfiddle.com/#!9/b0f3279/44 http://sqlfiddle.com/#!9/b0f3279/44

select
    ProcessId,
    date(CreatedDate) as `Date`,
    Status,
    sum(
        case
            when time_to_sec(NextDate) = 0 then 86400
            else time_to_sec(NextDate)
        end - time_to_sec(CreatedDate)
    ) as TimeSpentSeconds
from
    (
    select
        h1.ProcessId, h1.CreatedDate, h1.Status,
        min(
            h2.CreatedDate
            --case
            --    when date(h2.CreatedDate) > date(h1.CreatedDate)
            --    then date_add(date(h1.CreatedDate), interval 1 day)
            --    else h2.CreatedDate
            --end 
        ) as NextDate
    from
        (
        select ProcessId, CreatedDate, Status from history
        union
        select
            ProcessId,
            date_add(date(CreatedDate), interval 1 day),
            substring(
                max(
                    concat(
                        date_format(CreatedDate, get_format(datetime, 'ISO')),
                        Status
                    )
                ), 20, 10) as LastStatus
        from history h0
        where date(CreatedDate) <
            (
                select max(date(CreatedDate)) from history hm
                where hm.ProcessId = h0.ProcessId
            )
        group by ProcessId, date(CreatedDate)
        ) h1
            inner join
        (
        select ProcessId, CreatedDate, Status from history
        union
        select
            ProcessId,
            date_add(date(CreatedDate), interval 1 day),
            substring(
                max(
                    concat(
                        date_format(CreatedDate, get_format(datetime, 'ISO')),
                        Status
                    )
                ), 20, 10) as LastStatus
        from history h0
        where date(CreatedDate) <
            (
                select max(date(CreatedDate)) from history hm
                where hm.ProcessId = h0.ProcessId
            )
        group by ProcessId, date(CreatedDate)
        ) h2
            on      h2.ProcessId   = h1.ProcessId
                and h1.CreatedDate < h2.CreatedDate
                and h2.CreatedDate <= date_add(date(h1.CreatedDate), interval 1 day)
    group by h1.ProcessId, h1.CreatedDate, h1.Status
    ) hx
group by ProcessId, date(CreatedDate), Status
order by ProcessId, `Date`, Status desc, TimeSpentSeconds

I believe this second option would handle my instantaneous/duplicate status scenario mentioned above. 我相信第二个选项可以处理上述我的即时/重复状态情况。 It was already a little complicated but this feels a lot more messy. 它已经有点复杂了,但是感觉更加混乱了。 I added a sort of sequence number to facilitate a tie break and tweaked the time difference expression. 我添加了一种序列号以方便平局,并调整了时差表达式。 Finally I included a having clause to eliminate rows with zero accumulation from being spit out. 最后,我包含了一个having子句,以消除吐出零累积的行。 Refer to ProcessX in the fiddle's sample data: 请参阅小提琴的样本数据中的ProcessX:

select
    ProcessId,
    date(CreatedDate) as `Date`,
    Status,
    sum(
        case
            when NextDate > CreatedDate and time_to_sec(NextDate) = 0 then 86400
            else time_to_sec(NextDate)
        end - time_to_sec(CreatedDate)
    ) as TimeSpentSeconds
from
    (
    select
        h1.ProcessId, h1.CreatedDate, h1.Status,
        min(
            h2.CreatedDate,
            --case
            --    when date(h2.CreatedDate) > date(h1.CreatedDate)
            --    then date_add(date(h1.CreatedDate), interval 1 day)
            --    else h2.CreatedDate
            --end 
        ) as NextDate
    from
        (
        select 1 as Sequence, ProcessId, CreatedDate, Status from history
        union all
        select
            0,
            ProcessId,
            date_add(date(CreatedDate), interval 1 day),
            substring(
                max(
                    concat(
                        date_format(CreatedDate, get_format(datetime, 'ISO')),
                        Status
                    )
                ), 20, 10) as LastStatus
        from history h0
        where date(CreatedDate) <
            (
                select max(date(CreatedDate)) from history hm
                where hm.ProcessId = h0.ProcessId
            )
        group by ProcessId, date(CreatedDate)
        ) h1
            inner join
        (
        select 1 as Sequence, ProcessId, CreatedDate, Status from history
        union all
        select
            0,
            ProcessId,
            date_add(date(CreatedDate), interval 1 day),
            substring(
                max(
                    concat(
                        date_format(CreatedDate, get_format(datetime, 'ISO')),
                        Status
                    )
                ), 20, 10) as LastStatus
        from history h0
        where date(CreatedDate) <
            (
                select max(date(CreatedDate)) from history hm
                where hm.ProcessId = h0.ProcessId
            )
        group by ProcessId, date(CreatedDate)
        ) h2
            on      h2.ProcessId   = h1.ProcessId
                and (
                        h1.CreatedDate <  h2.CreatedDate
                    and h2.CreatedDate <= date_add(date(h1.CreatedDate), interval 1 day)
                    or
                        h1.CreatedDate =  h2.CreatedDate
                    and h1.Sequence    <  h2.Sequence
                )   
    group by h1.ProcessId, h1.CreatedDate, h1.Status
    ) hx
group by ProcessId, date(CreatedDate), Status
having TimeSpentSeconds > 0 /* MySQL shortcut reference */
order by ProcessId, `Date`, Status desc, TimeSpentSeconds

http://sqlfiddle.com/#!9/b582b2/10 http://sqlfiddle.com/#!9/b582b2/10

I just kind of realized that my expressions for NextDate don't need to check for midnight overrun so I commented that out. 我只是意识到我的NextDate表达式不需要检查午夜溢出,因此我将其注释掉。 I didn't change the fiddles though. 我没有改变小提琴。 And I also forgot to mention that I am assuming there's at least one status report per day for each process. 而且我也忘记提及我假设每个流程每天至少有一份状态报告。 Maybe this is a starting point to play around with other MySQL options like temp tables (for speed) or variables (for lead/lag.) 也许这是开始使用其他MySQL选项的起点,例如临时表(用于速度)或变量(用于超前/滞后)。

I liked your question because it gave me a reason to play around with SQL, which I didn't have a chance to do in a while. 我喜欢您的问题,因为它为我提供了使用SQL的理由,而我有一段时间没有这样做的机会了。

Here is my take on your question. 这是我对你的问题的看法。

First, we prepare a temporary table TempStatusLog , where for each day we add a record at 00:00:01 with the status equal to the earliest reading of that day, and a record at 23:59:59 with the latest reading of that day. 首先,我们准备一个临时表TempStatusLog ,其中每天在TempStatusLog添加一条记录,该记录的状态等于该天的最早读取时间,并在23:59:59添加一条记录,并记录最近的读取时间天。 We also number all the rows using a variable @rownumvar . 我们还使用变量@rownumvar所有行进行@rownumvar Assuming, that the original table is called StatusLog , the temporary table is created using this SELECT statement: 假设原始表名为StatusLog ,则使用此SELECT语句创建临时表:

SELECT @rownumvar := @rownumvar + 1 AS `rowNo`,
       `t`.`ProcessId`, `t`.`CreatedDate`, `t`.`Status`
FROM (SELECT `ProcessId`, `CreatedDate`, `Status`
      FROM   `StatusLog`

      UNION

      SELECT `ProcessId`,
             STR_TO_DATE(CONCAT(`OnDate`, ' 23:59:59'),
                         '%Y-%m-%d %H:%i:%s') AS `CreatedDate`,
            (SELECT `Status`
             FROM   `StatusLog` AS `l`
             WHERE  `l`.`ProcessId` = `t1`.`ProcessId` AND
                    `l`.`CreatedDate`
                      = STR_TO_DATE(CONCAT(`t1`.`OnDate`, ' ', `t1`.`LastStatus`),
                                    '%Y-%m-%d %H:%i:%s')) AS `Status`
      FROM (SELECT `ProcessId`,
                   DATE_FORMAT(`CreatedDate`, '%Y-%m-%d') AS `OnDate`,
                   DATE_FORMAT(MAX(TIME(`CreatedDate`)), '%H:%i:%s') AS `LastStatus`
            FROM   `StatusLog`
            GROUP BY DATE(`OnDate`), `ProcessId`
            ORDER BY `ProcessId`, DATE(`OnDate`)) AS `t1`

      UNION

      SELECT `ProcessId`,
             STR_TO_DATE(CONCAT(`OnDate`, ' 00:00:01'),
                         '%Y-%m-%d %H:%i:%s') AS `CreatedDate`,
            (SELECT `Status`
             FROM   `StatusLog` AS `l`
             WHERE  `l`.`ProcessId` = `t2`.`ProcessId` AND
                    `l`.`CreatedDate`
                      = STR_TO_DATE(CONCAT(`t2`.`OnDate`, ' ', `t2`.`FirstStatus`),
                                    '%Y-%m-%d %H:%i:%s')) AS `Status`
      FROM (SELECT `ProcessId`,
                   DATE_FORMAT(`CreatedDate`, '%Y-%m-%d') AS `OnDate`,
                   DATE_FORMAT(MIN(TIME(`CreatedDate`)), '%H:%i:%s') AS `FirstStatus`
            FROM   `StatusLog`
            GROUP BY DATE(`OnDate`), `ProcessId`
            ORDER BY `ProcessId`, DATE(`OnDate`)) AS `t2`) AS `t`,
     (SELECT @rownumvar := 0) AS `r`
ORDER BY `t`.`ProcessId`, `t`.`CreatedDate` ASC

Now it is relatively easy to calculate for how long each process was in each state every day. 现在,相对容易地计算出每个进程每天在每个状态下运行了多长时间。 We select a running window of two rows (this is where numbered rows come into play) and calculate the time differences between each two readings, which are then summed up: 我们选择两行的运行窗口(这是编号行起作用的地方),并计算每两个读数之间的时间差,然后将它们相加:

SELECT `p`.`ProcessId`,
       DATE_FORMAT(`q`.`CreatedDate`, '%Y-%m-%d') AS `Day`,
       DATE_FORMAT(
         SEC_TO_TIME(
           SUM(
             TIME_TO_SEC(
               TIMEDIFF(TIME(`q`.`CreatedDate`),
                        TIME(`p`.`CreatedDate`))
             )
           )
         ),
         '%H:%i:%s'
       ) AS `Elapsed`,
       `p`.`Status`
FROM   `TempStatusLog` AS `p`,
       `TempStatusLog` AS `q`
WHERE  `q`.`rowNo` = `p`.`rowNo` + 1 AND
       DATE(`q`.`CreatedDate`) = DATE(`p`.`CreatedDate`)
GROUP BY `Day`, `Status`, `ProcessId`
ORDER BY `Day` ASC, `ProcessId` ASC, `Status` ASC

There are two minor issues with this solution: 此解决方案有两个小问题:

  1. It loses 2 seconds every day. 每天损失2秒。 Ie, if a process was up the whole day, it will say it was up for 23:59:58. 就是说,如果一个过程在一整天都在进行,它将说这是在23:59:58进行的。
  2. If the process was up the whole day, there will be no record saying that it was down for 00:00:00 (and vice-versa) 如果该过程在一整天中都是正常的,则不会有记录表明该过程在00:00:00处于关闭状态(反之亦然)

To me, both issues seem to be too minor to bother about. 对我来说,这两个问题似乎都太小了,不值得理会。

Here you can take a look at a live demo: http://www.sqlfiddle.com/#!9/0a79cc/1 在这里,您可以看一下现场演示: http ://www.sqlfiddle.com/#!9/ 0a79cc/1

Note that SQLFiddle does not allow to create temporary tables, so I created a normal table for that purpose. 请注意,SQLFiddle不允许创建临时表,因此我为此创建了一个普通表。

PS: It was considerably harder to solve this in MySQL than it would have been in almost any other RDBMS, for MySQL does not support many features of SQL. PS:与几乎所有其他RDBMS相比,在MySQL中解决此问题要困难得多,因为MySQL不支持SQL的许多功能。 For one, it does not support CTE, which is a part of ANSI SQL specs. 一方面,它不支持CTE,后者是ANSI SQL规范的一部分。 This forces users to create temporary tables or find other similar workarounds. 这迫使用户创建临时表或找到其他类似的解决方法。 Many RDBMSs (Oracle, SQL Server) also support some variations of ROW_NUMBER() function, which I had to work around using a variable. 许多RDBMS(Oracle,SQL Server)也支持ROW_NUMBER()函数的某些变体,我必须使用变量来解决。

Just for fun. 纯娱乐。 Go for Postgres :) 去Postgres :)

select 
  ProcessId, CreatedDate, Status, 
  to_char( CreatedDate - lag( CreatedDate ) over ( order by CreatedDate, ProcessId ), 'HH24:MI' ) as diff
from history
order by ProcessId, ID;

http://sqlfiddle.com/#!15/83cb0/9 http://sqlfiddle.com/#!15/83cb0/9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM