简体   繁体   中英

awk - extract unique occurrences field 1, append multiple instances of field 2 on the same line in output

I am working on extracting service objects (ports/protocols) from a large router configuration file. Using awk, I would like to be able to take unique instances of $5 and print them on one line, with the various values in $7 printed after the unique instance of $5 separated by commas.

Input data:

set resources group port-group ServiceA port '1'
set resources group port-group ServiceA port '2'
set resources group port-group ServiceA port '3'
set resources group port-group ServiceB port '10'
set resources group port-group ServiceA port '1'
set resources group port-group ServiceA port '2'
set resources group port-group ServiceA port '3'
set resources group port-group ServiceB port '10'
set resources group port-group ServiceB port '20'
set resources group port-group ServiceC port '30'
set resources group port-group ServiceC port '40'
set resources group port-group ServiceD port '50'
set resources group port-group ServiceD port '5050'
set resources group port-group ServiceD port '60'
set resources group port-group ServiceD port '65'
set resources group port-group ServiceD port '66'
set resources group port-group ServiceD port '89'

Desired Output:

set resources group port-group ServiceA port 1, 2, 3
set resources group port-group ServiceB port 10, 20
set resources group port-group ServiceC port 30, 40
set resources group port-group ServiceD port 50, 5050, 60, 65, 66, 89

So far my attempts at making awk statements have not been fruitful.

What I've tried (it's part of a script so that's why there are CR.)

awk '{
gsub(/[:\47]/,"")}; i=!seen[$5]++; {print i,$7 } ' inputfile.txt

This gives me the following output:

set resources group port-group ServiceA port 1
1 1
0 2
0 3
set resources group port-group ServiceB port 8
1 8
0 1
0 2
0 3
0 8
0 3
set resources group port-group ServiceC port 2
1 2
0 3
set resources group port-group ServiceD port 8
1 8
0 5050
0 3
0 83
0 1
0 2
0 990
0 3000
0 3001
0 3002
0 3003

I'm assuming I will have to use a multidimensional array with a for loop to accomplish this, but I'm stuck. Any help is appreciated!

awk solution:

awk '!a[$5]{a[$5]=$0; uniq[$5,$7]=$7}{ if ($5 in a && uniq[$5,$7]!=$7){ 
      a[$5]=a[$5]","$7; uniq[$5,$7]=$7}}END{for(i in a) print a[i]}' inputfile.txt

The output:

set resources group port-group ServiceA port '1','2','3'
set resources group port-group ServiceB port '10','20'
set resources group port-group ServiceC port '30','40'
set resources group port-group ServiceD port '50','5050','60','65','66','89'

  • !a[$5]{a[$5]=$0; uniq[$5,$7]=$7} !a[$5]{a[$5]=$0; uniq[$5,$7]=$7} - capturing line at the first occurrence of the unique 5th field value

  • if($5 in a && uniq[$5,$7]!=$7) - check for duplacate values for the same Service...

  • uniq array if for accumulating unique bindings of 5th and 7th fields

  • a[$5]=a[$5]","$7 - add next unique value to the end of the crucial line


To get values without single quotes use the following approach:

group_port_values.awk script:

#!/bin/awk -f
BEGIN { FS="[ ']" }
!a[$5] {
    a[$5] = $0; 
    uniq[$5,$8] = $8
}
{
    if ($5 in a && uniq[$5,$8] != $8) { 
        a[$5] = a[$5]", "$8; 
        uniq[$5,$8] = $8
    }
}
END {
    for (i in a) {
        gsub(/\047/,"",a[i]);
        print a[i]
    }
}

Usage :

awk -f group_port_values.awk inputfile.txt

The output:

set resources group port-group ServiceA port 1, 2, 3
set resources group port-group ServiceB port 10, 20
set resources group port-group ServiceC port 30, 40
set resources group port-group ServiceD port 50, 5050, 60, 65, 66, 89

TGIF! Here's one for GNU awk using 2D arrays and bad coding habits (:

$ awk '
++a[$5 , $7]==1 {                   # if not seen before
    b[$5][++c[$5]]=$7 }             # hash it to b[key][index]
END{ 
    for(i in b) {                   # for all keys
        for(j=1;j<=c[i];j++)        # and all its indexes
            d=(j==1?"":d",")b[i][j] # gather buffer
        sub($5,i)                   # use the last known $0
        sub($NF,d)                  # and replace key and buffer to it
        print }                     # output
}' file
set resources group port-group ServiceA port '1','2','3'
set resources group port-group ServiceB port '10','20'
set resources group port-group ServiceC port '30','40'
set resources group port-group ServiceD port '50','5050','60','65','66','89'
$ cat tst.awk
BEGIN { OFS=", " }
{ gsub(/\047/,""); pfx=$1 FS $2 FS $3 FS $4 }
$5 != prev { prt(prev); prev=$5 }
!seen[$7]++ { ports[++numPorts] = $7 }
END { prt(prev) }

function prt(sg) {
    if ( sg != "" ) {
        printf "%s %s ", pfx, sg
        for (portNr=1; portNr<=numPorts; portNr++) {
            printf "%s%s", ports[portNr], (portNr<numPorts ? OFS : ORS)
        }
        delete ports
        delete seen
        numPorts = 0
    }
}

$ sort file | awk -f tst.awk
set resources group port-group ServiceA 1, 2, 3
set resources group port-group ServiceB 10, 20
set resources group port-group ServiceC 30, 40
set resources group port-group ServiceD 50, 5050, 60, 65, 66, 89

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM