简体   繁体   English

SQL查询分组日期/时间活动的“块”?

[英]SQL Query to Group “Blocks” of Date/Time Activity?

I have data in a table that is log data for server activity. 我在表中有数据,这是服务器活动的日志数据。 Here's what it looks like (the # column isn't part of the database or output, but there to be able to refer to that data in my Notes below): 这是它的样子(#列不是数据库或输出的一部分,但是我可以在下面的注释中引用该数据):

# | DateStamp           | Server  
- | ------------------- | ---------  
1 | 2016-12-01 03:15:19 | Server 1
2 | 2016-12-01 03:17:19 | Server 2
3 | 2016-12-01 03:17:24 | Server 2
4 | 2016-12-01 03:18:01 | Server 1
5 | 2016-12-01 03:18:07 | Server 3
6 | 2016-12-01 04:01:03 | Server 3
7 | 2016-12-01 07:18:47 | Server 1
8 | 2016-12-01 07:19:23 | Server 1
9 | 2016-12-01 09:19:39 | Server 2
10| 2016-12-01 11:19:54 | Server 3

And I want to write a query that outputs: 我想写一个输出的查询:

# | Server   | Online              | Offline
- | -------- | ------------------- | -------------------
1 | Server 1 | 2016-12-01 03:15:19 | 2016-12-01 03:18:01
2 | Server 2 | 2016-12-01 03:17:19 | 2016-12-01 03:17:24
3 | Server 3 | 2016-12-01 03:18:07 | 2016-12-01 03:18:07
4 | Server 1 | 2016-12-01 07:18:47 | 2016-12-01 07:19:23
5 | Server 2 | 2016-12-01 09:19:39 | 2016-12-01 09:19:39
6 | Server 3 | 2016-12-01 11:19:54 | (still online)

Notes: 笔记:

  • This is basically a tally of when these servers were "active" and for how long. 这基本上是这些服务器“活跃”和持续多长时间的统计数据。
  • If the next server's activity is greater than an hour apart, it is considered a new session and gets a new line. 如果下一个服务器的活动间隔大于一小时,则将其视为新会话并获取新行。 (ie row 1 and row 4 of output, based on data rows 4 and 7 above) (即输出的第1行和第4行,基于上面的数据行4和7)
  • To Clarify: Line 1 of output decides 03:18:01 is "offline" because the next entry for Server 1 (at 07:18:47 on line 7 of Data) is more than an hour later. 澄清:输出的第1行决定03:18:01是“离线”,因为服务器1的下一个条目(数据第7行的07:18:47)超过一个小时。
  • Line 5 of output shows offline because more than an hour has passed and no new entries for Server 2 have appeared 输出的第5行显示为脱机,因为已经过了一个多小时,并且没有出现服务器2的新条目

I would love to know how to query for this, and group my results based on the Output and Notes above. 我很想知道如何查询,并根据上面的输出和注释对结果进行分组。 Let me know if you need more information to suggest a solution. 如果您需要更多信息来建议解决方案,请与我们联系。

1) First of all you should load your logs into MySQL DB: 1)首先,您应该将日志加载到MySQL DB中:

# Optionally
#drop table if exists srv_logs;

create table srv_logs (
        `id` INT(10) NOT NULL AUTO_INCREMENT,
        `datetime` DATETIME ,
        `server` VARCHAR(300),
    PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
LOAD DATA INFILE 'yourfile.log'
    INTO TABLE srv_logs
    CHARSET utf8
    FIELDS TERMINATED BY '|'  
    OPTIONALLY ENCLOSED BY '"'   
    LINES TERMINATED BY '\n'  IGNORE 2 LINES (
        `id`,
        `datetime`,
        `server`
    );

2) Create/fill init data your downtime table: 2)创建/填充初始数据您的停机时间表:

create table srv_downtime (
        `id` INT(10) NOT NULL AUTO_INCREMENT,
        `server` VARCHAR(300),
        `online` DATETIME ,
        `offline` DATETIME ,
    PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into  srv_downtime (`server`, `online`, `offline`)
SELECT l.server, MAX(l.datetime), null
FROM srv_logs l
left join srv_logs l2 
   on l.server = l2.server 
   and l.datetime > l2.datetime
   and TIMESTAMPDIFF(MINUTE,l2.datetime,l.datetime) < 60
where l2.id is null 
GROUP BY l.server

3) Repeatedly invoke this insert until no new lines will be added, it will add new lines to the bottom, (previous work period) 3)反复调用此插入,直到不添加新行,它将向底部添加新行(前一工作期)

    insert into  srv_downtime (`server`, `online`, `offline`)
    (select a.server, min(l2.datetime), offline from
      (SELECT d.server, max(l.datetime) as offline
         FROM srv_downtime d
         left join srv_logs l 
            on l.server = d.server 
            and d.online > l.datetime
         group by l.server
      ) a
      left join srv_logs l2 
        on a.offline > l2.datetime 
        and l2.server = a.server
        and TIMESTAMPDIFF(MINUTE, l2.datetime, a.offline) < 60
      group by a.server
   )

So on example dataset after this 3 steps the result seems correct: 因此,在这3个步骤之后的示例数据集上,结果似乎是正确的:

Server 1  | 2016-12-01 03:15:19 | 2016-12-01 03:18:01
Server 1  | 2016-12-01 07:18:47 | NULL
Server 2  | 2016-12-01 03:17:19 | 2016-12-01 03:17:24
Server 2  | 2016-12-01 09:19:39 | NULL
Server 3  | 2016-12-01 03:18:07 | 2016-12-01 04:01:03
Server 3  | 2016-12-01 11:19:54 | NULL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM