繁体   English   中英

如何从日志文件中提取唯一事件并对其进行计数?

[英]How to extract unique events from a log file and count them?

我有一个shoutcast连接日志文件,想要找出使用的客户端和频率。 日志文件非常庞大(大约100mb),包含过去3年的条目。 日志条目看起来像这样(IP已被随机化!):

<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@15:46:34> [dest: 1.187.2.99] connection closed (9 seconds) (UID: 25477)[L: 1]{Bytes: 403705}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:40:56> [dest: 1.194.2.16] connection closed (981 seconds) (UID: 25478)[L: 1]{Bytes: 15938209}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@16:41:40> [dest: 1.158.2.39] connection closed (11 seconds) (UID: 25479)[L: 1]{Bytes: 432719}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:07:48> [dest: 1.142.2.225] connection closed (979 seconds) (UID: 25480)[L: 1]{Bytes: 15919475}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:19:07> [dest: 1.232.2.215] connection closed (19 seconds) (UID: 25481)[L: 1]{Bytes: 417192}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@18:34:46> [dest: 1.187.2.99] connection closed (2 seconds) (UID: 25482)[L: 1]{Bytes: 282751}(P: 1)

我想提取每个独特的客户端,并计算这种客户端的使用频率。 对于上面的日志,结果应如下所示:

Internet%20Explorer%207   2
WMPlayer/10.0.0.364       1
WinampMPEG/5.50           2
TapinRadio                1

首先,我简单地过滤了所有无用的条目。 (抱歉用猫 。)

cat shoutcast.log | grep "starting stream" > filtered.txt

结果如下:

<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)

但现在呢? 我有点迷失,如何访问{A: }括号中的信息?

尝试这个awk行:

 awk -F'{A: |}' '/starting/{a[$2]++}END{for(x in a)print x" : "a[x]}' input

测试您的数据:

kent$  cat ff
<03/23/13@15:46:25> [dest: 1.187.2.99] starting stream (UID: 25477)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@15:46:34> [dest: 1.187.2.99] connection closed (9 seconds) (UID: 25477)[L: 1]{Bytes: 403705}(P: 1)
<03/23/13@16:24:36> [dest: 1.194.2.16] starting stream (UID: 25478)[L: 2]{A: WMPlayer/10.0.0.364}(P: 1)
<03/23/13@16:40:56> [dest: 1.194.2.16] connection closed (981 seconds) (UID: 25478)[L: 1]{Bytes: 15938209}(P: 1)
<03/23/13@16:41:29> [dest: 1.158.2.39] starting stream (UID: 25479)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@16:41:40> [dest: 1.158.2.39] connection closed (11 seconds) (UID: 25479)[L: 1]{Bytes: 432719}(P: 1)
<03/23/13@17:51:29> [dest: 1.142.2.225] starting stream (UID: 25480)[L: 2]{A: WinampMPEG/5.50}(P: 1)
<03/23/13@18:07:48> [dest: 1.142.2.225] connection closed (979 seconds) (UID: 25480)[L: 1]{Bytes: 15919475}(P: 1)
<03/23/13@18:18:48> [dest: 1.232.2.215] starting stream (UID: 25481)[L: 2]{A: TapinRadio}(P: 1)
<03/23/13@18:19:07> [dest: 1.232.2.215] connection closed (19 seconds) (UID: 25481)[L: 1]{Bytes: 417192}(P: 1)
<03/23/13@18:34:45> [dest: 1.187.2.99] starting stream (UID: 25482)[L: 2]{A: Internet%20Explorer%207}(P: 1)
<03/23/13@18:34:46> [dest: 1.187.2.99] connection closed (2 seconds) (UID: 25482)[L: 1]{Bytes: 282751}(P: 1)

kent$  awk -F'{A: |}' '/starting/{a[$2]++}END{for(x in a)print x" : "a[x]}' ff
WMPlayer/10.0.0.364 : 1
TapinRadio : 1
WinampMPEG/5.50 : 2
Internet%20Explorer%207 : 2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM