在Linux中分析时间跟踪数据

Question

I have a log file containing a time series of events. 我有一个包含事件时间序列的日志文件。 Now, I want to analyze the data to count the number of event for different intervals. 现在，我想分析数据以计算不同时间间隔的事件数。 Each entry shows that an event has occured in this timestamp. 每个条目都表明该时间戳记中已发生一个事件。 For example here is a part of log file 例如，这是日志文件的一部分

09:00:00
09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08
09:06:46
09:07:42
09:08:55

I need to count the events for 5 minutes intervals. 我需要每隔5分钟对事件进行一次计数。 The result should be like: 结果应为：

09:00  4       //which means 4 events from time 09:00:00 until 09:04:59<br>
09:05  5        //which means 4 events from time 09:00:05 until 09:09:59<br>

and so on. 等等。

Do you know any trick in bash, shell, awk, ...? 您知道bash，shell，awk等方面的任何技巧吗？
Any help is appreciated. 任何帮助表示赞赏。

Answer 1

awk to the rescue. awk救援。

awk -v FS="" '{min=$5<5?0:5; a[$1$2$4min]++} END{for (i in a) print i, a[i]}' file

Explanation 说明

It gets the values of the 1st, 2nd, 4th and 5th characters in every line and keeps track of how many times they have appeared. 它获取每行中第1、2、4和5个字符的值，并跟踪它们出现了多少次。 To group in 0-4 and 5-9 range, it creates the var min that is 0 in the first case and 5 in the second. 要在0-4和5-9范围内分组，它将创建var min ，在第一种情况下为0 ，在第二种情况下为5 。

Sample 样品

With your input, 根据您的输入，

$ awk -v FS="" '{min=$5<5?0:5; a[$1$2$4min]++} END{for (i in a) print i, a[i]}' a
0900 5
0905 5

With another sample input, 在另一个示例输入中，

$ cat a
09:00:00
09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08
09:06:46
09:07:42
09:08:55
09:18:55
09:19:55
10:09:55
10:19:55

$ awk -v FS="" '{min=$5<5?0:5; a[$1$2$4min]++} END{for (i in a) print i, a[i]}' a
0900 5
0905 5
0915 2
1005 1
1015 1

Answer 2

another way with awk 用awk的另一种方法

awk -F : '{t=sprintf ("%02d",int($2/5)*5);a[$1 FS t]++}END{for (i in a) print i,a[i]}' file |sort -t: -k1n -k2n

09:00 5
09:05 5

explanation: 说明：

use : as field seperator
int($2/5)*5 is used to group the minutes into every 5 minute (00,05,10,15...)
a[$1 FS t]++ count the numbers.
the last sort command will output the sorted time.

Answer 3

Perl with output piped through uniq just for fun: Perl的输出通过uniq传递只是为了好玩：

$ cat file
09:00:00
09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08
09:06:46
09:07:42
09:08:55
09:18:55
09:19:55
10:09:55
10:19:55
11:21:00

Command: 命令：

perl -F: -lane 'print $F[0].sprintf(":%02d",int($F[1]/5)*5);' file | uniq -c

Output: 输出：

Or just perl: 或者只是perl：

perl -F: -lane '$t=$F[0].sprintf(":%02d",int($F[1]/5)*5); $c{$t}++; END { print join(" ", $_, $c{$_}) for sort keys %c }' file

Output: 输出：

Answer 4

I realize this is an old question, but when I stumbled onto it I couldn't resist poking at it from another direction... 我意识到这是一个老问题，但是当我偶然发现它时，我无法抗拒从另一个方向戳它。

sed -e 's/:/ /' -e 's/[0-4]:.*$/0/' -e 's/[5-9]:.*$/5/' | uniq -c

In this form it assumes the data is from standard input, or add the filename as the final argument before the pipe. 以这种形式，它假定数据来自标准输入，或在管道之前将文件名添加为最终参数。

It's not unlike Michal's initial approach, but if you happen to need a quick and dirty analysis of a huge log, sed is a lightweight and capable tool. 它与Michal的最初方法没有什么不同，但是如果您碰巧需要对庞大的日志进行快速而肮脏的分析，则sed是一种轻量级且功能强大的工具。

The assumption is that the data truly is in a regular format - any hiccups will appear in the result. 假设数据确实是常规格式-任何打h都会出现在结果中。

As a breakdown - given the input 作为细分-给定输入

09:00:35
09:01:20
09:02:51
09:03:04
09:05:12
09:06:08

and applying each edit clause individually, the intermediate results are as follows: 1) Eliminate the first colon. 并分别应用每个edit子句，中间结果如下：1）消除第一个冒号。

-e 's/:/ /'
09 00:35
09 01:20
09 02:51
09 03:04
09 05:12

2) Transform minutes 0 through 4 to 0. 2）将分钟0到4转换为0。

-e 's/[0-4]:.*$/0/'
09 00
09 00
09 00
09 00
09 05:12
09 06:08

3) Transform minutes 5-9 to 5: 3）将分钟5-9转换为5：

-e 's/[5-9]:.*$/5/'
09 00
09 00
09 00
09 00
09 05
09 05

2 and 3 also delete all trailing content from the lines, which would make the lines non-unique (and hence 'uniq -c' would fail to produce the desired results). 2和3还将删除行中的所有尾随内容，这将使行不唯一（因此'uniq -c'将无法产生所需的结果）。

Perhaps the biggest strength of using sed as the front end is that you can select on lines of interest, for example, if root logged in remotely: 使用sed作为前端的最大优势也许是可以选择感兴趣的行，例如，如果root远程登录：

sed -e '/sshd.*: Accepted .* for root from/!d' -e 's/:/ /' ... /var/log/secure

在Linux中分析时间跟踪数据

问题描述

4 个解决方案

解决方案1
1 已采纳 2014-01-30 10:21:34

Explanation 说明

Sample 样品

解决方案2
0 2014-01-30 11:21:08

解决方案3
0 2014-01-30 11:53:40

解决方案4
0 2015-06-30 23:10:14

在Linux中分析时间跟踪数据

问题描述

4 个解决方案

解决方案1 1 已采纳 2014-01-30 10:21:34

Explanation 说明

Sample 样品

解决方案2 0 2014-01-30 11:21:08

解决方案3 0 2014-01-30 11:53:40

解决方案4 0 2015-06-30 23:10:14

解决方案1
1 已采纳 2014-01-30 10:21:34

解决方案2
0 2014-01-30 11:21:08

解决方案3
0 2014-01-30 11:53:40

解决方案4
0 2015-06-30 23:10:14