简体   繁体   中英

Awk printing out smallest and highest number, in a time format

I'm fairly new to linux/bash shell and I'm really having trouble printing two values (the highest and lowest) from a particular column in a text file. The file is formatted like this:

Geoff        Audi           2:22:35.227
Bob          Mercedes       1:24:22.338
Derek        Jaguar         1:19:77.693
Dave         Ferrari        1:08:22.921

As you can see the final column is a timing, I'm trying to use awk to print out the highest and lowest timing in the column. I'm really stumped, I've tried:

awk '{print sort -n <  $NF}' timings.txt 

However that didn't even seem to sort anything, I just received an output of:

1
0
1
0
...

Repeating over and over, it went on for longer but I didn't want a massive line of it when you get the point after the first couple iterations.

My desired output would be:

Min: 1:08:22.921
Max: 2:22:35.227   

After question clarifications : if the time field always has a same number of digits in the same place, eg h:mm:ss.ss , the solution can be drastically simplified. Namely, we don't need to convert time to seconds to compare it anymore, we can do a simple string/lexicographical comparison:

$ awk 'NR==1 {m=M=$3} {$3<m&&m=$3; $3>M&&M=$3} END {printf("min: %s\nmax: %s",m,M)}' file
min: 1:08:22.921
max: 2:22:35.227

The logic is the same as in the (previous) script below, just using a simpler string-only based comparison for ordering values (determining min/max). We can do that since we know all timings will conform to the same format, and if a < b (for example "1:22:33" < "1:23:00" ) we know a is "smaller" than b . (If values are not consistently formatted, then by using the lexicographical comparison alone, we can't order them, eg "12:00:00" < "3:00:00" .)

So, on first value read (first record, NR==1 ), we set the initial min/max value to the timing read (in the 3rd field). For each record we test if the current value is smaller than the current min, and if it is, we set the new min. Similarly for the max. We use short circuiting instead if to make expressions shorter ( $3<m && m=$3 is equivalent to if ($3<m) m=$3 ). In the END we simply print the result.


Here's a general awk solution that accepts time strings with variable number of digits for hours/minutes/seconds per record:

$ awk '{split($3,t,":"); s=t[3]+60*(t[2]+60*t[1]); if (s<min||NR==1) {min=s;min_t=$3}; if (s>max||NR==1) {max=s;max_t=$3}} END{print "min:",min_t; print "max:",max_t}' file
min: 1:22:35.227
max: 10:22:35.228

Or, in a more readable form:

#!/usr/bin/awk -f
{
    split($3, t, ":")
    s = t[3] + 60 * (t[2] + 60 * t[1])
    if (s < min || NR == 1) {
        min = s
        min_t = $3
    }
    if (s > max || NR == 1) {
        max = s
        max_t = $3
    }
}

END {
    print "min:", min_t
    print "max:", max_t
}

For each line, we convert the time components (hours, minutes, seconds) from the third field to seconds which we can later simply compare as numbers. As we iterate, we track the current min val and max val, printing them in the END . Initial values for min and max are taken from the first line ( NR==1 ).

Given your statements that the time field is actually a duration and the hours component is always a single digit, this is all you need:

$ awk 'NR==1{min=max=$3} {min=(min<$3?min:$3); max=(max>$3?max:$3)} END{print "Min:", min ORS "Max:", max}' file
Min: 1:08:22.921
Max: 2:22:35.227

You don't want to run sort inside of awk (even with the proper syntax).

Try this:

sed 1d timings.txt | sort -k3,3n | sed -n '1p; $p'

where

  • the first sed will remove the header
  • sort on the 3rd column numerically
  • the second sed will print the first and last line

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM