简体   繁体   English

提取列Awk的一部分

[英]Extract part of Column Awk

I am trying to count the number of occurrences per second in a log file for a term searched. 我正在尝试计算日志文件中搜索到的术语每秒发生的次数。 I've been using AWK and have the issue of the time stamp being locate in a column with additional information. 我一直在使用AWK,并在包含其他信息的列中找到时间戳的问题。 Is it possible to get the number of occurrences per second by only looking for the time pattern 00:00:00 - 24:00:00? 仅查找时间模式00:00:00-24:00:00是否可以获得每秒的出现次数?

Data example: 数据示例:

[01/May/2018:23:59:59.532
[01/May/2018:23:59:59.848
[01/May/2018:23:59:59.851
[01/May/2018:23:59:59.911
[01/May/2018:23:59:59.923
[01/May/2018:23:59:59.986
[01/May/2018:23:59:59.988
[01/May/2018:23:59:59.756
[01/May/2018:23:59:59.786
[01/May/2018:23:59:59.883

So far I can extract the data easily enough using: 到目前为止,我可以使用以下命令轻松地提取数据:

awk '/00:00:00/,/24:00:00/{if(/search_term/) a[$4]++} END{for(k in a) print k " - " a[k]}' file.log |sort

This will return: 这将返回:

[02/May/2018:10:40:05.903 - 1
[02/May/2018:10:40:05.949 - 1
[02/May/2018:10:40:05.975 - 1
[02/May/2018:10:40:05.982 - 2
[02/May/2018:10:40:06.022 - 1
[02/May/2018:10:40:06.051 - 1
[02/May/2018:10:40:06.054 - 1
[02/May/2018:10:40:06.086 - 1
[02/May/2018:10:40:06.094 - 1
[02/May/2018:10:40:06.126 - 1

What I'm aiming for is more: 我的目标是:

10:40:05 - 5
10:40:06 - 6

No idea if I'm even thinking about this correctly. 不知道我是否正在考虑正确。 New to AWK in general. 一般而言,这是AWK的新功能。

Use colon and dot as the field separators, and we have hours in col2, minutes in col3 and seconds in col4 使用冒号和点作为字段分隔符,我们在col2中有小时,在col3中有分钟,在col4中有秒

awk -F'[:.]' '
    {count[$2 ":" $3 ":" $4]++} 
    END {for (time in count) print time " - " count[time]}
' file
10:40:05 - 4
10:40:06 - 6

Output will not necessarily be sorted. 输出不一定要排序。 If you're using GNU awk, use 如果您使用的是GNU awk,请使用

END {
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (time in count) 
        print time " - " count[time]
}

( reference ), or simply pipe the output to | sort 参考 ),或简单地将输出通过管道传递给| sort | sort

One thing you can do is this: 您可以做的一件事是:

awk 'BEGIN{FIELDWIDTHS = "1 11 1 12"} {print $4}' datetimes

Specify the field widths and then this will give you your time, for example. 指定字段宽度,然后例如,这将为您提供时间。 If you don't care about milliseconds, then "1 11 1 8 4" 如果您不关心毫秒,请选择"1 11 1 8 4"

You can use substr for the line as index of an array. 您可以将substr用作行的数组索引。 for example, you have this file 例如,您有此文件

cat 1.txt
[01/May/2018:23:59:59.532
[01/May/2018:01:59:59.848
[01/May/2018:02:59:59.851
[01/May/2018:02:59:59.911
[01/May/2018:02:59:59.923
[01/May/2018:02:00:59.986

you can use an awk command like this 你可以这样使用awk命令

cat 1.txt | awk '{a[substr($0,index($0,":")+1,8)]++} END{for(i in a) print i" - "a[i]}'

where substr($0,index($0,":")+1,8) cuts 8 chars from the occurrence of the first ":", use this as index of the array 其中substr($ 0,index($ 0,“:”)+ 1,8)从第一个“:”的内容中切出8个字符,将其用作数组的索引

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM