简体   繁体   English

使用awk匹配每一列的行

[英]use awk to match rows for each column

How can awk be used to find values that match in row 2 for each column? 如何使用awk为每一列查找与第2行匹配的值?

I would like to take in a tab limited file and for each column if any row below row 2 matches what is in row 2, print field with "match". 我想输入一个制表符限制的文件,如果第2行以下的任何行与第2行中的任何行匹配,则为每列输入带有“ match”的字段。

transforming this tab delimited file: 转换此制表符分隔的文件:

header1 | header2 | header3
1       | 1       | B
--------+---------+----------
3       | 1       | A
2       | A       | B
1       | B       | 1

To this: 对此:

header1 | header2 | header3
1       | 1       | B
--------+---------+----------
3       | 1 match | A
2       | A       | B match
1 match | B       | 1

I would go for something like this: 我会去这样的事情:

$ cat file
header1 header2 header3
1       1       B
3       1       A
2       A       B
1       B       1
$ awk -v OFS='\t' 'NR == 2 { for (i=1; i<=NF; ++i) a[i] = $i }
    NR > 2 { for(i=1;i<=NF;++i) if ($i == a[i]) $i = $i " match" }1' file
header1 header2 header3
1       1       B
3       1 match A
2       A       B match
1 match B       1

On the second line, populate the array a with the contents of each field. 在第二行,使用每个字段的内容填充数组a On subsequent lines, add "match" when they match the corresponding value in the array. 在随后的行上,当它们与数组中的相应值匹配时,添加“ match”。 The 1 at the end is a common shorthand causing each line to be printed. 末尾的1是常见的缩写,导致每行都要打印。 Setting the output field separator OFS to a tab character preserves the format of the data. 将输出字段分隔符OFS设置为制表符可保留数据格式。

Pedantically, with GNU Awk 4.1.1 : GNU Awk 4.1.1

awk -f so.awk so.txt
header1 header2 header3
1       1       B
3       1*      A
2       A       B*
1*      B       1

with so.awk : so.awk

{
    if(1 == NR) {
        print $0;
    } else if(2 == NR) {
        for(i = 1; i <= NF; i++) {
            answers[i]=$i;
        }
        print $0;
    } else {
        for(i = 1; i <= NF; i++) {
            field = $i;
            if(answers[i]==$i) {
                field = field "*" # a match
            }
            printf("%s\t",field);
        }
        printf("%s", RS);
    }
}

and so.txt as a tab delimited data file: so.txt作为制表符分隔的数据文件:

header1 header2 header3
1       1       B
3       1       A
2       A       B
1       B       1

This isn't homework, right...? 这不是功课吧?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM