简体   繁体   中英

awk sum up multiple files show lines which does not appear on both sets of files

I have been using awk to sum up multiple files, this is used to sum up the summary of server log parsing values, it really does speed up the final overall count but I have hit a minor problem and the typical examples I have hit on the web have not helped.

Here is the example:

cat file1
aa 1
bb 2
cc 3
ee 4

cat file2
aa 1
bb 2
cc 3
dd 4

cat file3
aa 1
bb 2
cc 3
ff 4

And the script:

cat test.sh 
#!/bin/bash

files="file1 file2 file3"

i=0;
oldname="";
for names in $(echo $files); do
        ((i++));
        if [ $i == 1 ]; then
                oldname=$names
                #echo "-- $i $names"
                shift;
        else
               oldname1=$names.$$
        awk  'NR==FNR { _[$1]=$2 } NR!=FNR { if(_[$1] != "") nn=0; nn=($2+_[$1]); print $1" "nn }' $names $oldname> $oldname1
        if [ $i -gt 2 ]; then
            rm $oldname;
        fi
                oldname=$oldname1

    fi
done
echo "------------------------------ $i"
cat $oldname

When I run this, the identical columns are added up but those that appear only in one of the files does not

./test.sh 
------------------------------ 3
aa 3
bb 6
cc 9
ee 4

ff dd does not appear in the list, from what I have seen its within the NR==FR

I have come across this:

http://dbaspot.com/shell/246751-awk-comparing-two-files-problem.html

you want all the lines in file1 that are not in file2,
awk 'NR == FNR { a[$0]; next } !($0 in a)' file2 file1

If you want only uniq lines in file1 that are not in file2,
awk 'NR == FNR { a[$0]; next } !($0 in a) { print; a[$0] }'
file2
file1

but this only complicates current issue further when attempted since lots of other fields get duplicated

After posting question - updates to the content ... and tests....

I wanted to stick with awk since it does appear to be a much shorter way of achieving result there is a problem still..

awk '{a[$1]+=$2}END{for (k in a) print k,a[k]}'  file1 file2 file3
aa 3
bb 6
cc 9
ee 4
ff 4
gg 4
RESULT_SET_4 0
RESULT_SET_3 0
RESULT_SET_2 0
RESULT_SET_1 0
$ cat file1 
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ff 4
$ cat file2
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ee 4

The file content is not left as it was originally ie the results are not under the headings, my original method did keep it all intact

Updated expected output - headings in correct context

cat file1 
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ff 4



cat file2 
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ee 4


cat file3
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
gg 4
test.sh awk line to produce above is :

awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] != "") { if  ($2 ~ /[0-9]/)   { nn=($2+_[$1]); print $1" "nn; } else { print;} }else { print; } }' $names $oldname> $oldname1

./test.sh 
------------------------------ 3
RESULT_SET_1
aa 3
RESULT_SET_2
bb 6
RESULT_SET_3
cc 9
RESULT_SET_4
ff 4

works but destroys required formatting

  awk '($2 != "")  {a[$1]+=$2};  ($2 == "") {  a[$1]=$2 } END {for (k in a) print k,a[k]} '  file1 file2 file3
    aa 3
    bb 6
    cc 9
    ee 4
    ff 4
    gg 4
    RESULT_SET_4 
    RESULT_SET_3 
    RESULT_SET_2 
    RESULT_SET_1 
$ awk '{a[$1]+=$2}END{for (k in a) print k,a[k]}' file1 file2 file3 | sort
aa 3
bb 6
cc 9
dd 4
ee 4
ff 4

Edit:

It's a bit of a hack but it does the job:

$ awk 'FNR==NR&&!/RESULT/{a[$1]=$2;next}($1 in a){a[$1]+=$2}END{for (k in a) print k,a[k]}' file1 file2 file3 | sort | awk '$1="RESULTS_SET_"NR"\n"$1'
RESULTS_SET_1
aa 3
RESULTS_SET_2
bb 6
RESULTS_SET_3
cc 9
RESULTS_SET_4
ff 4

You can do this in awk , as sudo_O suggested, but you can also do it in pure bash.

#!/bin/bash

# We'll use an associative array, where the indexes are strings.
declare -A a

# Our list of files, in an array (not associative)
files=(file1 file2 file3)

# Walk through array of files...
for file in "${files[@]}"; do
  # And for each file, increment the array index with the value.
  while read index value; do
    ((a[$index]+=$value))
  done < "$file"
done 

# Walk through array. ${!...} returns a list of indexes.
for i in ${!a[@]}; do
  echo "$i ${a[$i]}"
done

And the result:

$ ./doit
dd 4
aa 3
ee 4
bb 6
ff 4
cc 9

And if you want the output sorted ... you can pipe it through sort . :)

Here's one way using GNU awk . Run like:

awk -f script.awk File1 File2 File3

Contents of script.awk :

sub(/RESULT_SET_/,"") {

    i = $1
    next
}

{
    a[i][$1]+=$2
}

END {
    for (j=1;j<=length(a);j++) {

        print "RESULT_SET_" j

        for (k in a[j]) {
            print k, a[j][k]
        }
    }
}

Results:

RESULT_SET_1
aa 3
RESULT_SET_2
bb 6
RESULT_SET_3
cc 9
RESULT_SET_4
ee 4
ff 4
gg 4

Alternatively, here's the one-liner:

awk 'sub(/RESULT_SET_/,"") { i = $1; next } { a[i][$1]+=$2 } END { for (j=1;j<=length(a);j++) { print "RESULT_SET_" j; for (k in a[j]) print k, a[j][k] } }' File1 File2 File3

fixed using this Basically it goes through each file, if the entry exists on the other side, it will add the entry to approximate line number with a 0 value so that it can sum up the content - been testing this on my current output and seems to be working real well

#!/bin/bash

 files="file1 file2 file3 file4 file5 file6 file7 file8"
RAND="$$"
i=0;
oldname="";
for names in $(echo $files); do
        ((i++));
        if [ $i == 1 ]; then
                oldname=$names
                shift;
        else
               oldname1=$names.$RAND
        for entries in $(awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] == "") { if  ($2 ~ /[0-9]/)   { nn=0; nn=(_[$1]+=$2);  print FNR"-"$1"%0"} else { } } else { } }' $oldname $names); do
                line=$(echo ${entries%%-*})
                content=$(echo ${entries#*-})
                content=$(echo $content|tr "%" " ")

edit=$(ed -s $oldname  << EOF
$line
a
$content
.
w
q
EOF 
)

$edit  >/dev/null 2>&1

done

                awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] != "") { if  ($2 ~ /[0-9]/)   { nn=0; nn=($2+_[$1]); print $1" "nn; } else { print $1;} }else { print; } }' $names $oldname> $oldname1
        oldname=$oldname1
    fi
done

cat $oldname
#rm file?.*

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM