简体   繁体   English

Awk以时间格式打印出最小和最高的数字

[英]Awk printing out smallest and highest number, in a time format

I'm fairly new to linux/bash shell and I'm really having trouble printing two values (the highest and lowest) from a particular column in a text file. 我是linux / bash shell的新手,我真的无法从文本文件中的特定列打​​印两个值(最高和最低)。 The file is formatted like this: 该文件的格式如下:

Geoff        Audi           2:22:35.227
Bob          Mercedes       1:24:22.338
Derek        Jaguar         1:19:77.693
Dave         Ferrari        1:08:22.921

As you can see the final column is a timing, I'm trying to use awk to print out the highest and lowest timing in the column. 正如您所看到的,最后一列是时间,我正在尝试使用awk打印出列中的最高和最低时间。 I'm really stumped, I've tried: 我真的很难过,我试过了:

awk '{print sort -n <  $NF}' timings.txt 

However that didn't even seem to sort anything, I just received an output of: 然而,这甚至没有任何排序,我刚收到的输出:

1
0
1
0
...

Repeating over and over, it went on for longer but I didn't want a massive line of it when you get the point after the first couple iterations. 一遍又一遍地重复,它持续了更长时间,但是当你在第一次迭代之后得到点时我不想要它的大量线。

My desired output would be: 我想要的输出是:

Min: 1:08:22.921
Max: 2:22:35.227   

After question clarifications : if the time field always has a same number of digits in the same place, eg h:mm:ss.ss , the solution can be drastically simplified. 问题澄清 :如果时间字段总是具有在同一个地方一个相同的位数,例如h:mm:ss.ss ,溶液可以显着地简化。 Namely, we don't need to convert time to seconds to compare it anymore, we can do a simple string/lexicographical comparison: 也就是说,我们不需要将时间转换为秒来进行比较,我们可以进行简单的字符串/词典编纂比较:

$ awk 'NR==1 {m=M=$3} {$3<m&&m=$3; $3>M&&M=$3} END {printf("min: %s\nmax: %s",m,M)}' file
min: 1:08:22.921
max: 2:22:35.227

The logic is the same as in the (previous) script below, just using a simpler string-only based comparison for ordering values (determining min/max). 逻辑与下面的(上一个)脚本中的逻辑相同,只是使用更简单的基于字符串的比较来排序值(确定最小值/最大值)。 We can do that since we know all timings will conform to the same format, and if a < b (for example "1:22:33" < "1:23:00" ) we know a is "smaller" than b . 我们可以做到这一点,因为我们知道所有的时间都符合相同的格式,如果a < b (例如"1:22:33" < "1:23:00" )我们知道ab更“小”。 (If values are not consistently formatted, then by using the lexicographical comparison alone, we can't order them, eg "12:00:00" < "3:00:00" .) (如果值的格式不一致,那么单独使用词典比较,我们无法对它们进行排序,例如"12:00:00" < "3:00:00" 。)

So, on first value read (first record, NR==1 ), we set the initial min/max value to the timing read (in the 3rd field). 因此,在第一个值读取(第一个记录, NR==1 )时,我们将初始最小/最大值设置为读取的时间(在第3个字段中)。 For each record we test if the current value is smaller than the current min, and if it is, we set the new min. 对于每个记录,我们测试当前值是否小于当前min,如果是,我们设置新的min。 Similarly for the max. 同样的最大值。 We use short circuiting instead if to make expressions shorter ( $3<m && m=$3 is equivalent to if ($3<m) m=$3 ). if要使表达式更短,我们使用短路( $3<m && m=$3相当于if ($3<m) m=$3 )。 In the END we simply print the result. END我们只需打印结果。


Here's a general awk solution that accepts time strings with variable number of digits for hours/minutes/seconds per record: 这是一个通用的awk解决方案 ,接受每个记录的小时/分钟/秒的可变位数的时间字符串:

$ awk '{split($3,t,":"); s=t[3]+60*(t[2]+60*t[1]); if (s<min||NR==1) {min=s;min_t=$3}; if (s>max||NR==1) {max=s;max_t=$3}} END{print "min:",min_t; print "max:",max_t}' file
min: 1:22:35.227
max: 10:22:35.228

Or, in a more readable form: 或者,以更易读的形式:

#!/usr/bin/awk -f
{
    split($3, t, ":")
    s = t[3] + 60 * (t[2] + 60 * t[1])
    if (s < min || NR == 1) {
        min = s
        min_t = $3
    }
    if (s > max || NR == 1) {
        max = s
        max_t = $3
    }
}

END {
    print "min:", min_t
    print "max:", max_t
}

For each line, we convert the time components (hours, minutes, seconds) from the third field to seconds which we can later simply compare as numbers. 对于每一行,我们将时间分量(小时,分钟,秒)从第三个字段转换为秒,我们稍后可以将其作为数字进行比较。 As we iterate, we track the current min val and max val, printing them in the END . 在迭代时,我们跟踪当前的最小值和最大值,并在END打印它们。 Initial values for min and max are taken from the first line ( NR==1 ). min和max的初始值取自第一行( NR==1 )。

Given your statements that the time field is actually a duration and the hours component is always a single digit, this is all you need: 鉴于您的陈述时间字段实际上是一个持续时间而小时组件始终是一个数字,这就是您所需要的:

$ awk 'NR==1{min=max=$3} {min=(min<$3?min:$3); max=(max>$3?max:$3)} END{print "Min:", min ORS "Max:", max}' file
Min: 1:08:22.921
Max: 2:22:35.227

You don't want to run sort inside of awk (even with the proper syntax). 你不想在awk中运行sort(即使使用正确的语法)。

Try this: 试试这个:

sed 1d timings.txt | sort -k3,3n | sed -n '1p; $p'

where 哪里

  • the first sed will remove the header 第一个sed将删除标题
  • sort on the 3rd column numerically 按数字排序第3列
  • the second sed will print the first and last line 第二个sed将打印第一行和最后一行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM