简体   繁体   English

如果每行包含时间戳,如何使用 bash 获取日志记录频率?

[英]How to get frequency of logging using bash if each line contains a timestamp?

I have a program that during it's operation it writes to a text file.我有一个程序,它在运行期间会写入一个文本文件。 In this text file each line consists of 4 parts.在这个文本文件中,每一行由 4 个部分组成。

  1. Thread ID (a number)线程 ID(一个数字)
  2. A date in the format yyyy-mm-dd格式为 yyyy-mm-dd 的日期
  3. A timestamp in the format 12:34:56.123456格式为 12:34:56.123456 的时间戳
  4. A function name一个 function 名称
  5. Some useful comments printed out by the programs程序打印出来的一些有用的注释

An example of what a log line would look like would be something like this:日志行的示例如下所示:

127894 2020-07-30 22:04:30.234124 foobar caught an unknown exception
127895 2020-07-30 22:05:30.424134 foobar clearing the programs cache
127896 2020-07-30 22:06:30.424134 foobar recalibrating dankness

The logs are printed in chronological order and I would like to know how I to get the highest frequency of these logs.日志按时间顺序打印,我想知道如何获得这些日志的最高频率。 For example I wanted to know at what minute or second of the day the program has the highest congestion.例如,我想知道程序在一天中的哪一分钟或几秒拥塞最严重。

Ideally I'd like an output that could tell me for example, "The highest logging frequency is between 22:04:00 and 22:05:00 with 10 log lines printed in this timeframe".理想情况下,我想要一个 output 可以告诉我,例如“最高记录频率在 22:04:00 和 22:05:00 之间,在此时间范围内打印 10 条日志行”。

Let's consider this test file:让我们考虑这个测试文件:

$ cat file.log 
127894 2020-07-30 22:04:30.234124 foobar caught an unknown exception
127895 2020-07-30 22:05:20.424134 foobar clearing the programs cache
127895 2020-07-30 22:05:30.424134 foobar clearing the programs cache
127895 2020-07-30 22:05:40.424134 foobar clearing the programs cache
127896 2020-07-30 22:06:30.424134 foobar recalibrating dankness
127896 2020-07-30 22:06:40.424134 foobar recalibrating dankness

To get the most congested minutes, ranked in order:要获得最拥挤的分钟数,按顺序排列:

$ awk '{sub(/:[^:]*$/, "", $3); a[$2" "$3]++} END{for (d in a)print a[d], d}' file.log | sort -nr
3 2020-07-30 22:05
2 2020-07-30 22:06
1 2020-07-30 22:04

22:05 appeared three times in the log file and is, thus, the most congested, followed by 22:06. 22:05 在日志文件中出现了 3 次,因此是最拥塞的,其次是 22:06。

To get only the top most congested minutes, add head .要仅获取最拥挤的分钟数,请添加head For example:例如:

$ awk '{sub(/:[^:]*$/, "", $3); a[$2" "$3]++} END{for (d in a)print a[d], d}' file.log | sort -nr | head -1
3 2020-07-30 22:05

Note that we select here based on the second and third fields.注意我们这里select是基于第二个和第三个字段的。 The presense of dates or times in the texts of log messages will not confuse this code.日志消息文本中日期或时间的存在不会混淆此代码。

How it works这个怎么运作

sub(/:[^:]*$/, "", $3) removes everything after minutes in the third field. sub(/:[^:]*$/, "", $3)删除第三个字段中几分钟后的所有内容。

a[$2" "$3]++ counts the number of times that date and time (up to minutes) appeared. a[$2" "$3]++计算日期和时间(最多分钟)出现的次数。

After the whole file has been read, for (d in a)print a[d], d prints out the count and date for every date observed.读取整个文件后, for (d in a)print a[d], d打印出每个观察日期的计数和日期。

sort -nr sorts the output with the highest count at the top. sort -nr -nr 对 output 进行排序,最高计数位于顶部。 (Alternatively, we could have awk do the sorting but sort -nr is simple and portable.) (或者,我们可以让 awk 进行排序,但sort -nr -nr 简单且可移植。)

To sort down to the second排序到第二个

Instead of minutes resolution, we can get seconds resolution:我们可以得到秒分辨率,而不是分钟分辨率:

$ awk '{sub(/\.[^.]*$/, "", $3); a[$2" "$3]++} END{for (d in a)print a[d], d}' file.log | sort -nr
1 2020-07-30 22:06:40
1 2020-07-30 22:06:30
1 2020-07-30 22:05:40
1 2020-07-30 22:05:30
1 2020-07-30 22:05:20
1 2020-07-30 22:04:30

With GNU utilities:使用 GNU 实用程序:

grep -o ' [0-9][0-9]:[0-9][0-9]' file.log | sort | uniq -c | sort -nr | head -n 1

Prints印刷

frequency  HH:MM

HH:MM is the hour and minute the highest frequency occurs and frequency is the highest frequency. HH:MM是最高频率出现的小时和分钟, frequency是最高频率。 If you drop the | head -n 1如果你放弃| head -n 1 | head -n 1 then you will see the list of frequencies and minutes ordered by frequencies. | head -n 1然后您将看到按频率排序的频率和分钟列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM