awk总结多个文件显示两行文件中没有出现的行

Question

我一直在使用awk来总结多个文件，这用于总结服务器日志解析值的摘要，它确实加快了最终的整体计数，但我遇到了一个小问题，而且我遇到了一个典型的例子网络没有帮助。

这是一个例子：

cat file1
aa 1
bb 2
cc 3
ee 4

cat file2
aa 1
bb 2
cc 3
dd 4

cat file3
aa 1
bb 2
cc 3
ff 4

和剧本：

cat test.sh 
#!/bin/bash

files="file1 file2 file3"

i=0;
oldname="";
for names in $(echo $files); do
        ((i++));
        if [ $i == 1 ]; then
                oldname=$names
                #echo "-- $i $names"
                shift;
        else
               oldname1=$names.$$
        awk  'NR==FNR { _[$1]=$2 } NR!=FNR { if(_[$1] != "") nn=0; nn=($2+_[$1]); print $1" "nn }' $names $oldname> $oldname1
        if [ $i -gt 2 ]; then
            rm $oldname;
        fi
                oldname=$oldname1

    fi
done
echo "------------------------------ $i"
cat $oldname

当我运行它时，会添加相同的列，但只出现在其中一个文件中的列不会

./test.sh 
------------------------------ 3
aa 3
bb 6
cc 9
ee 4

ff dd没有出现在列表中，从我在NR == FR中看到它

我遇到过这个：

http://dbaspot.com/shell/246751-awk-comparing-two-files-problem.html

you want all the lines in file1 that are not in file2,
awk 'NR == FNR { a[$0]; next } !($0 in a)' file2 file1

If you want only uniq lines in file1 that are not in file2,
awk 'NR == FNR { a[$0]; next } !($0 in a) { print; a[$0] }'
file2
file1

但这只会在尝试时进一步使当前问题复杂化，因为许多其他字段都会重复

发布问题后 - 更新内容...和测试....

我想坚持使用awk，因为它似乎是一个更短的实现结果的方法仍然存在问题。

awk '{a[$1]+=$2}END{for (k in a) print k,a[k]}'  file1 file2 file3
aa 3
bb 6
cc 9
ee 4
ff 4
gg 4
RESULT_SET_4 0
RESULT_SET_3 0
RESULT_SET_2 0
RESULT_SET_1 0
$ cat file1 
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ff 4
$ cat file2
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ee 4

文件内容不是原来的，即结果不在标题下，我的原始方法确实保持完整

更新了预期输出 - 正确上下文中的标题

cat file1 
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ff 4



cat file2 
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
ee 4


cat file3
RESULT_SET_1
aa 1
RESULT_SET_2
bb 2
RESULT_SET_3
cc 3
RESULT_SET_4
gg 4
test.sh awk line to produce above is :

awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] != "") { if  ($2 ~ /[0-9]/)   { nn=($2+_[$1]); print $1" "nn; } else { print;} }else { print; } }' $names $oldname> $oldname1

./test.sh 
------------------------------ 3
RESULT_SET_1
aa 3
RESULT_SET_2
bb 6
RESULT_SET_3
cc 9
RESULT_SET_4
ff 4

有效，但破坏了所需的格式

  awk '($2 != "")  {a[$1]+=$2};  ($2 == "") {  a[$1]=$2 } END {for (k in a) print k,a[k]} '  file1 file2 file3
    aa 3
    bb 6
    cc 9
    ee 4
    ff 4
    gg 4
    RESULT_SET_4 
    RESULT_SET_3 
    RESULT_SET_2 
    RESULT_SET_1

Answer 1

$ awk '{a[$1]+=$2}END{for (k in a) print k,a[k]}' file1 file2 file3 | sort
aa 3
bb 6
cc 9
dd 4
ee 4
ff 4

编辑：

这有点像黑客，但它做的工作：

$ awk 'FNR==NR&&!/RESULT/{a[$1]=$2;next}($1 in a){a[$1]+=$2}END{for (k in a) print k,a[k]}' file1 file2 file3 | sort | awk '$1="RESULTS_SET_"NR"\n"$1'
RESULTS_SET_1
aa 3
RESULTS_SET_2
bb 6
RESULTS_SET_3
cc 9
RESULTS_SET_4
ff 4

Answer 2

您可以在awk执行此操作，如sudo_O建议的那样，但您也可以在纯bash中执行此操作。

#!/bin/bash

# We'll use an associative array, where the indexes are strings.
declare -A a

# Our list of files, in an array (not associative)
files=(file1 file2 file3)

# Walk through array of files...
for file in "${files[@]}"; do
  # And for each file, increment the array index with the value.
  while read index value; do
    ((a[$index]+=$value))
  done < "$file"
done 

# Walk through array. ${!...} returns a list of indexes.
for i in ${!a[@]}; do
  echo "$i ${a[$i]}"
done

结果如下：

$ ./doit
dd 4
aa 3
ee 4
bb 6
ff 4
cc 9

如果你想要输出排序......你可以通过sort管道。 :)

Answer 3

这是使用GNU awk的一种方式。 运行如下：

awk -f script.awk File1 File2 File3

script.awk内容：

sub(/RESULT_SET_/,"") {

    i = $1
    next
}

{
    a[i][$1]+=$2
}

END {
    for (j=1;j<=length(a);j++) {

        print "RESULT_SET_" j

        for (k in a[j]) {
            print k, a[j][k]
        }
    }
}

结果：

RESULT_SET_1
aa 3
RESULT_SET_2
bb 6
RESULT_SET_3
cc 9
RESULT_SET_4
ee 4
ff 4
gg 4

或者，这是单行：

awk 'sub(/RESULT_SET_/,"") { i = $1; next } { a[i][$1]+=$2 } END { for (j=1;j<=length(a);j++) { print "RESULT_SET_" j; for (k in a[j]) print k, a[j][k] } }' File1 File2 File3

Answer 4

使用它固定它基本上遍历每个文件，如果条目存在于另一侧，它将添加条目以近似行号0值，以便它可以总结内容 - 在我当前的输出上测试这个并且似乎工作得很好

#!/bin/bash

 files="file1 file2 file3 file4 file5 file6 file7 file8"
RAND="$$"
i=0;
oldname="";
for names in $(echo $files); do
        ((i++));
        if [ $i == 1 ]; then
                oldname=$names
                shift;
        else
               oldname1=$names.$RAND
        for entries in $(awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] == "") { if  ($2 ~ /[0-9]/)   { nn=0; nn=(_[$1]+=$2);  print FNR"-"$1"%0"} else { } } else { } }' $oldname $names); do
                line=$(echo ${entries%%-*})
                content=$(echo ${entries#*-})
                content=$(echo $content|tr "%" " ")

edit=$(ed -s $oldname  << EOF
$line
a
$content
.
w
q
EOF 
)

$edit  >/dev/null 2>&1

done

                awk -v i=$i 'NR==FNR { _[$1]=$2 } NR!=FNR { if (_[$1] != "") { if  ($2 ~ /[0-9]/)   { nn=0; nn=($2+_[$1]); print $1" "nn; } else { print $1;} }else { print; } }' $names $oldname> $oldname1
        oldname=$oldname1
    fi
done

cat $oldname
#rm file?.*

awk总结多个文件显示两行文件中没有出现的行

问题描述

4 个解决方案

解决方案1
3 已采纳 2013-02-14 14:14:17

解决方案2
2 2013-02-14 14:27:27

解决方案3
1 2013-02-14 15:53:54

解决方案4
0 2013-02-15 09:51:45

awk总结多个文件显示两行文件中没有出现的行

问题描述

4 个解决方案

解决方案1 3 已采纳 2013-02-14 14:14:17

解决方案2 2 2013-02-14 14:27:27

解决方案3 1 2013-02-14 15:53:54

解决方案4 0 2013-02-15 09:51:45

解决方案1
3 已采纳 2013-02-14 14:14:17

解决方案2
2 2013-02-14 14:27:27

解决方案3
1 2013-02-14 15:53:54

解决方案4
0 2013-02-15 09:51:45