简体   繁体   English

awk总结多个文件显示两行文件中没有出现的行

[英]awk sum up multiple files show lines which does not appear on both sets of files

I have been using awk to sum up multiple files, this is used to sum up the summary of server log parsing values, it really does speed up the final overall count but I have hit a minor problem and the typical examples I have hit on the web have not helped. 我一直在使用awk来总结多个文件,这用于总结服务器日志解析值的摘要,它确实加快了最终的整体计数,但我遇到了一个小问题,而且我遇到了一个典型的例子网络没有帮助。

Here is the example: 这是一个例子:

cat file1
aa 1
bb 2
cc 3
ee 4

cat file2
aa 1
bb 2
cc 3
dd 4

cat file3
aa 1
bb 2
cc 3
ff 4

And the script: 和剧本:

cat test.sh 
#!/bin/bash

files="file1 file2 file3"

i=0;
oldname="";
for names in $(echo $files); do
        ((i++));
        if [ $i == 1 ]; then
                oldname=$names
                #echo "-- $i $names"
                shift;
        else
               oldname1=$names.$$
        awk  'NR==FNR { _[$1]=$2 } NR!=FNR { if(_[$1] != "") nn=0; nn=($2+_[$1]); print $1" "nn }' $names $oldname> $oldname1
        if [ $i -gt 2 ]; then
            rm $oldname;
        fi
                oldname=$oldname1

    fi
done
echo "------------------------------ $i"
cat $oldname

When I run this, the identical columns are added up but those that appear only in one of the files does not 当我运行它时,会添加相同的列,但只出现在其中一个文件中的列不会

./test.sh 
------------------------------ 3
aa 3
bb 6
cc 9
ee 4

ff dd does not appear in the list, from what I have seen its within the NR==FR ff dd没有出现在列表中,从我在NR == FR中看到它

I have come across this: 我遇到过这个:

http://dbaspot.com/shell/246751-awk-comparing-two-files-problem.html http://dbaspot.com/shell/246751-awk-comparing-two-files-problem.html

you want all the lines in file1 that are not in file2,
awk 'NR == FNR { a[$0]; next } !($0 in a)' file2 file1

If you want only uniq lines in file1 that are not in file2,
awk 'NR == FNR { a[$0]; next } !($0 in a) { print; a[$0] }'
file2
file1

but this only complicates current issue further when attempted since lots of other fields get duplicated 但这只会在尝试时进一步使当前问题复杂化,因为许多其他字段都会重复

After posting question - updates to the content ... and tests.... 发布问题后 - 更新内容...和测试....

I wanted to stick with awk since it does appear to be a much shorter way of achieving result there is a problem still.. 我想坚持使用awk,因为它似乎是一个更短的实现结果的方法仍然存在问题。

awk '{a[$1]+=$2}END{for (k in a) print k,a[k]}'  file1 file2 file3
aa 3
bb 6
cc 9
ee 4
ff 4
gg 4
RESULT_SET_4 0
RESULT_SET_3 0
RESULT_SET_2 0
RESULT_SET_1 0
$ cat file1 
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ff 4
$ cat file2
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ee 4

The file content is not left as it was originally ie the results are not under the headings, my original method did keep it all intact 文件内容不是原来的,即结果不在标题下,我的原始方法确实保持完整

Updated expected output - headings in correct context 更新了预期输出 - 正确上下文中的标题

cat file1 
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ff 4



cat file2 
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ee 4


cat file3
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
gg 4
test.sh awk line to produce above is :

awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] != "") { if  ($2 ~ /[0-9]/)   { nn=($2+_[$1]); print $1" "nn; } else { print;} }else { print; } }' $names $oldname> $oldname1

./test.sh 
------------------------------ 3
RESULT_SET_1
aa 3
RESULT_SET_2
bb 6
RESULT_SET_3
cc 9
RESULT_SET_4
ff 4

works but destroys required formatting 有效,但破坏了所需的格式

  awk '($2 != "")  {a[$1]+=$2};  ($2 == "") {  a[$1]=$2 } END {for (k in a) print k,a[k]} '  file1 file2 file3
    aa 3
    bb 6
    cc 9
    ee 4
    ff 4
    gg 4
    RESULT_SET_4 
    RESULT_SET_3 
    RESULT_SET_2 
    RESULT_SET_1 
$ awk '{a[$1]+=$2}END{for (k in a) print k,a[k]}' file1 file2 file3 | sort
aa 3
bb 6
cc 9
dd 4
ee 4
ff 4

Edit: 编辑:

It's a bit of a hack but it does the job: 这有点像黑客,但它做的工作:

$ awk 'FNR==NR&&!/RESULT/{a[$1]=$2;next}($1 in a){a[$1]+=$2}END{for (k in a) print k,a[k]}' file1 file2 file3 | sort | awk '$1="RESULTS_SET_"NR"\n"$1'
RESULTS_SET_1
aa 3
RESULTS_SET_2
bb 6
RESULTS_SET_3
cc 9
RESULTS_SET_4
ff 4

You can do this in awk , as sudo_O suggested, but you can also do it in pure bash. 您可以在awk执行此操作,如sudo_O建议的那样,但您也可以在纯bash中执行此操作。

#!/bin/bash

# We'll use an associative array, where the indexes are strings.
declare -A a

# Our list of files, in an array (not associative)
files=(file1 file2 file3)

# Walk through array of files...
for file in "${files[@]}"; do
  # And for each file, increment the array index with the value.
  while read index value; do
    ((a[$index]+=$value))
  done < "$file"
done 

# Walk through array. ${!...} returns a list of indexes.
for i in ${!a[@]}; do
  echo "$i ${a[$i]}"
done

And the result: 结果如下:

$ ./doit
dd 4
aa 3
ee 4
bb 6
ff 4
cc 9

And if you want the output sorted ... you can pipe it through sort . 如果你想要输出排序......你可以通过sort管道。 :) :)

Here's one way using GNU awk . 这是使用GNU awk的一种方式。 Run like: 运行如下:

awk -f script.awk File1 File2 File3

Contents of script.awk : script.awk内容:

sub(/RESULT_SET_/,"") {

    i = $1
    next
}

{
    a[i][$1]+=$2
}

END {
    for (j=1;j<=length(a);j++) {

        print "RESULT_SET_" j

        for (k in a[j]) {
            print k, a[j][k]
        }
    }
}

Results: 结果:

RESULT_SET_1
aa 3
RESULT_SET_2
bb 6
RESULT_SET_3
cc 9
RESULT_SET_4
ee 4
ff 4
gg 4

Alternatively, here's the one-liner: 或者,这是单行:

awk 'sub(/RESULT_SET_/,"") { i = $1; next } { a[i][$1]+=$2 } END { for (j=1;j<=length(a);j++) { print "RESULT_SET_" j; for (k in a[j]) print k, a[j][k] } }' File1 File2 File3

fixed using this Basically it goes through each file, if the entry exists on the other side, it will add the entry to approximate line number with a 0 value so that it can sum up the content - been testing this on my current output and seems to be working real well 使用它固定它基本上遍历每个文件,如果条目存在于另一侧,它将添加条目以近似行号0值,以便它可以总结内容 - 在我当前的输出上测试这个并且似乎工作得很好

#!/bin/bash

 files="file1 file2 file3 file4 file5 file6 file7 file8"
RAND="$$"
i=0;
oldname="";
for names in $(echo $files); do
        ((i++));
        if [ $i == 1 ]; then
                oldname=$names
                shift;
        else
               oldname1=$names.$RAND
        for entries in $(awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] == "") { if  ($2 ~ /[0-9]/)   { nn=0; nn=(_[$1]+=$2);  print FNR"-"$1"%0"} else { } } else { } }' $oldname $names); do
                line=$(echo ${entries%%-*})
                content=$(echo ${entries#*-})
                content=$(echo $content|tr "%" " ")

edit=$(ed -s $oldname  << EOF
$line
a
$content
.
w
q
EOF 
)

$edit  >/dev/null 2>&1

done

                awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] != "") { if  ($2 ~ /[0-9]/)   { nn=0; nn=($2+_[$1]); print $1" "nn; } else { print $1;} }else { print; } }' $names $oldname> $oldname1
        oldname=$oldname1
    fi
done

cat $oldname
#rm file?.*

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM