Use bash to cluster based on one column of a line

Question

The input is as below

For each category (labelled as A,B,C..in the 1st column), I'd like to find the minimum as well as maximum numbers (as biggest range). So expect to see:

A  15  240
B  65  300
C  34  400

So how could I do using bash?

Answer 1

Using awk:

awk '
    !($1 in min) { min[$1] = $2; max[$1] = $3; next }
    {
        min[$1] = ( $2 < min[$1] ? $2 : min[$1] )
        max[$1] = ( $3 > max[$1] ? $3 : max[$1] )
    } 
    END {
        for(x in min) print x, min[x], max[x]
}' file
A 15 240
B 65 300
C 34 400

We iterate each line and assign min and max values to a map that has first column as the key. In the END block we iterate the hash and print out the key and values from both maps.

Answer 2

I tried to make an other solution (as a workaround) of the side affect of the unset variables in awk. (May be this is a little bit more optimized.)

cat min_max

#!/bin/bash
awk '
    NF!=3 || $2 $3 ~ "[^0-9-]" {next;}           # short filter
    min[$1]=="" {min[$1]=$2; max[$1]=$3; next;}  # first occur a given ID--> set min&max,read nxt ln
    min[$1]>$2  {min[$1]=$2;}                    # other occur IDs--> refresh min if required
    max[$1]<$3  {max[$1]=$3;}                    # refreshing max if required 
    END {for(x in min)printf("%-2s %5d %5d\n", x, min[x], max[x]);}
' $1

cat in4

A  20  240
B  65  210
C  90  400
A  15  150
C  34  320
E  -30  -20
D   0  100
B  80  300
D  10   90
E  -20 -10

./min_max in4

A     15   240
B     65   300
C     34   400
D      0   100
E    -30   -10

This bash code produces the same.

cat min_max2

#!/bin/bash
(($#!=1))&& { echo "Usage $0 inpfile"; exit 1; }
declare -A min max                                   # define associative arrays
while read id mn mx; do
   [[ ${min[$id]+any} == "" ]] && { min[$id]=$mn; max[$id]=$mx; continue; } # parameter extension
   (( ${min[$id]} > $mn )) && min[$id]=$mn
   (( ${max[$id]} < $mx )) && max[$id]=$mx
done <$1
for i in ${!min[@]}; do printf "%-2s %5d %5d\n" $i ${min[$i]} ${max[$i]}; done

Use bash to cluster based on one column of a line

Question

2 answers

solution1
2 2016-04-25 04:40:38

solution2
0 2016-04-25 10:24:09

Use bash to cluster based on one column of a line

Question

2 answers

solution1 2 2016-04-25 04:40:38

solution2 0 2016-04-25 10:24:09

solution1
2 2016-04-25 04:40:38

solution2
0 2016-04-25 10:24:09