提取列Awk的一部分

Question

I am trying to count the number of occurrences per second in a log file for a term searched. 我正在尝试计算日志文件中搜索到的术语每秒发生的次数。 I've been using AWK and have the issue of the time stamp being locate in a column with additional information. 我一直在使用AWK，并在包含其他信息的列中找到时间戳的问题。 Is it possible to get the number of occurrences per second by only looking for the time pattern 00:00:00 - 24:00:00? 仅查找时间模式00:00:00-24:00:00是否可以获得每秒的出现次数？

Data example: 数据示例：

[01/May/2018:23:59:59.532
[01/May/2018:23:59:59.848
[01/May/2018:23:59:59.851
[01/May/2018:23:59:59.911
[01/May/2018:23:59:59.923
[01/May/2018:23:59:59.986
[01/May/2018:23:59:59.988
[01/May/2018:23:59:59.756
[01/May/2018:23:59:59.786
[01/May/2018:23:59:59.883

So far I can extract the data easily enough using: 到目前为止，我可以使用以下命令轻松地提取数据：

awk '/00:00:00/,/24:00:00/{if(/search_term/) a[$4]++} END{for(k in a) print k " - " a[k]}' file.log |sort

This will return: 这将返回：

[02/May/2018:10:40:05.903 - 1
[02/May/2018:10:40:05.949 - 1
[02/May/2018:10:40:05.975 - 1
[02/May/2018:10:40:05.982 - 2
[02/May/2018:10:40:06.022 - 1
[02/May/2018:10:40:06.051 - 1
[02/May/2018:10:40:06.054 - 1
[02/May/2018:10:40:06.086 - 1
[02/May/2018:10:40:06.094 - 1
[02/May/2018:10:40:06.126 - 1

What I'm aiming for is more: 我的目标是：

10:40:05 - 5
10:40:06 - 6

No idea if I'm even thinking about this correctly. 不知道我是否正在考虑正确。 New to AWK in general. 一般而言，这是AWK的新功能。

Answer 1

Use colon and dot as the field separators, and we have hours in col2, minutes in col3 and seconds in col4 使用冒号和点作为字段分隔符，我们在col2中有小时，在col3中有分钟，在col4中有秒

awk -F'[:.]' '
    {count[$2 ":" $3 ":" $4]++} 
    END {for (time in count) print time " - " count[time]}
' file

10:40:05 - 4
10:40:06 - 6

Output will not necessarily be sorted. 输出不一定要排序。 If you're using GNU awk, use 如果您使用的是GNU awk，请使用

END {
    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (time in count) 
        print time " - " count[time]
}

( reference ), or simply pipe the output to | sort （参考），或简单地将输出通过管道传递给| sort | sort

Answer 2

One thing you can do is this: 您可以做的一件事是：

awk 'BEGIN{FIELDWIDTHS = "1 11 1 12"} {print $4}' datetimes

Specify the field widths and then this will give you your time, for example. 指定字段宽度，然后例如，这将为您提供时间。 If you don't care about milliseconds, then "1 11 1 8 4" 如果您不关心毫秒，请选择"1 11 1 8 4"

Answer 3

You can use substr for the line as index of an array. 您可以将substr用作行的数组索引。 for example, you have this file 例如，您有此文件

cat 1.txt
[01/May/2018:23:59:59.532
[01/May/2018:01:59:59.848
[01/May/2018:02:59:59.851
[01/May/2018:02:59:59.911
[01/May/2018:02:59:59.923
[01/May/2018:02:00:59.986

you can use an awk command like this 你可以这样使用awk命令

cat 1.txt | awk '{a[substr($0,index($0,":")+1,8)]++} END{for(i in a) print i" - "a[i]}'

where substr($0,index($0,":")+1,8) cuts 8 chars from the occurrence of the first ":", use this as index of the array 其中substr（$ 0，index（$ 0，“：”）+ 1,8）从第一个“：”的内容中切出8个字符，将其用作数组的索引

提取列Awk的一部分

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-05-02 21:28:04

解决方案2
-1 2018-05-02 21:25:04

解决方案3
-1 2018-05-03 00:26:13

提取列Awk的一部分

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-05-02 21:28:04

解决方案2 -1 2018-05-02 21:25:04

解决方案3 -1 2018-05-03 00:26:13

解决方案1
3 已采纳 2018-05-02 21:28:04

解决方案2
-1 2018-05-02 21:25:04

解决方案3
-1 2018-05-03 00:26:13