简体   繁体   中英

Using AWK to find the smallest and largest number in a column?

If I have a file with few column and I want to use an AWK command to show the largest and the lowest number in a particular column!

example:

a  212
b  323
c  23
d  45
e  54
f  102

I want my command to show that the lowest number is 23 and another command to say the highest number is 323

I have no idea why the answers are not working! I put a more realistic example of my file( maybe I should mention that is tab determined)

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=-1,Type=Integer,Description="List of Phred-scaled genotype likelihoods, number of values is (#ALT+1)*(#ALT+2)/2">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  rmdup_wl_25248.bam
Chr10   247     .       T       C       7.8     .       DP=37;AF1=0.5;CI95=0.5,0.5;DP4=7,1,19,0;MQ=15;FQ=6.38;PV4=0.3,1,0.038,1 GT:PL:GQ        0/1:37,0,34:36
Chr10   447     .       A       C       75      .       DP=30;AF1=1;CI95=1,1;DP4=0,0,22,5;MQ=14;FQ=-108 GT:PL:GQ        1/1:108,81,0:99
Chr10   449     .       G       C       35.2    .       DP=33;AF1=1;CI95=0.5,1;DP4=3,2,20,3;MQ=14;FQ=-44;PV4=0.21,1.7e-06,1,0.34        GT:PL:GQ        1/1:68,17,0:31
Chr10   517     .       G       A       222     .       DP=197;AF1=1;CI95=1,1;DP4=0,0,128,62;MQ=24;FQ=-282      GT:PL:GQ        1/1:255,255,0:99
Chr10   761     .       G       A       27      .       DP=185;AF1=0.5;CI95=0.5,0.5;DP4=24,71,8,54;MQ=20;FQ=30;PV4=0.07,8.4e-50,1,1     GT:PL:GQ        0/1:57,0,149:60
Chr10   1829    .       A       G       3.01    .       DP=74;AF1=0.4998;CI95=0.5,0.5;DP4=18,0,54,0;MQ=19;FQ=4.68;PV4=1,9.1e-12,0.003,1 GT:PL:GQ        0/1:30,0,45:28

I should say that I have already add excluding line that start with # so this is the command that I use:

awk '$1 !~/#/' | awk -F'\t' 'BEGIN{first=1;} {if (first) { max = min = $6; first = 0; next;} if (max < $6) max=$6; if (min > $6) min=$6; } END { print min, max }' wl_25210_filtered.vcf

awk '$1 !~/#/' | awk -F'\t' 'BEGIN{getline;min=max=$6} NF{ max=(max>$6)?max:$6 min=(min>$6)?$6:min} END{print min,max}' wl_25210_filtered.vcf

and

awk '$1 !~/#/' | awk -F'\t' '
NR==2{min=max=$6;next}
NR>2 && NF{
    max=(max>$6)?max:$6
    min=(min>$6)?$6:min
}
END{print min,max}' wl_25210_filtered.vcf

If your file contains empty lines, neither of the posted solutions will work. For correct handling of empty lines try this:

$ cat f.awk
BEGIN{getline;min=max=$6}
NF{
    max=(max>$6)?max:$6
    min=(min>$6)?$6:min
}
END{print min,max} 

Then run this command:

sed "/^#/d" my_file | awk -f f.awk

At first it catches the first line of the file to set min and max. Than for each non-empty line it use the ternary operator check, if a new min or max was found. At the end the result ist printed.

HTH Chris

You can create two user defined functions and use them as per your need. This will offer more generic solution.

[jaypal:~/Temp] cat file
a  212
b  323
c  23
d  45
e  54
f  102
[jaypal:~/Temp] awk '
function max(x){i=0;for(val in x){if(i<=x[val]){i=x[val];}}return i;}
function min(x){i=max(x);for(val in x){if(i>x[val]){i=x[val];}}return i;}
{a[$2]=$2;next}
END{minimum=min(a);maximum=max(a);print "Maximum = "maximum " and Minimum = "minimum}' file
Maximum = 323 and Minimum = 23

In the above solution, there are 2 user defined functions - max and min . We store the column 2 in an array. You can store each of your columns like this. In the END statement you can invoke the function and store the value in a variable and print it.

Hope this helps!

Update:

Executed the following as per the latest example -

[jaypal:~/Temp] awk '
function max(x){i=0;for(val in x){if(i<=x[val]){i=x[val];}}return i;}
function min(x){i=max(x);for(val in x){if(i>x[val]){i=x[val];}}return i;}
/^#/{next}
{a[$6]=$6;next}
END{minimum=min(a);maximum=max(a);print "Maximum = "maximum " and Minimum = "minimum}' sample
Maximum = 222 and Minimum = 3.01
awk 'BEGIN {max = 0} {if ($6>max) max=$6} END {print max}' yourfile.txt

The min can be found by:

awk 'BEGIN {min=1000000; max=0;}; { if($2<min && $2 != "") min = $2; if($2>max && $2 != "") max = $2; } END {print min, max}' file

This will output the minimum and maximum, comma-separated

awk 'BEGIN{first=1;} 
     {if (first) { max = min = $2; first = 0; next;}
      if (max < $2) max=$2; if (min > $2) min=$2; }
     END { print min, max }' file

Use the BEGIN and END blocks to initialize and print variables that keep track of the min and max.

eg,

awk 'BEGIN{max=0;min=512} { if (max < $1){ max = $1 }; if(min > $1){ min = $1 } } END{ print max, min}'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM