[英]use awk to match rows for each column
How can awk be used to find values that match in row 2 for each column? 如何使用awk为每一列查找与第2行匹配的值?
I would like to take in a tab limited file and for each column if any row below row 2 matches what is in row 2, print field with "match". 我想输入一个制表符限制的文件,如果第2行以下的任何行与第2行中的任何行匹配,则为每列输入带有“ match”的字段。
transforming this tab delimited file: 转换此制表符分隔的文件:
header1 | header2 | header3
1 | 1 | B
--------+---------+----------
3 | 1 | A
2 | A | B
1 | B | 1
To this: 对此:
header1 | header2 | header3
1 | 1 | B
--------+---------+----------
3 | 1 match | A
2 | A | B match
1 match | B | 1
I would go for something like this: 我会去这样的事情:
$ cat file
header1 header2 header3
1 1 B
3 1 A
2 A B
1 B 1
$ awk -v OFS='\t' 'NR == 2 { for (i=1; i<=NF; ++i) a[i] = $i }
NR > 2 { for(i=1;i<=NF;++i) if ($i == a[i]) $i = $i " match" }1' file
header1 header2 header3
1 1 B
3 1 match A
2 A B match
1 match B 1
On the second line, populate the array a
with the contents of each field. 在第二行,使用每个字段的内容填充数组a
。 On subsequent lines, add "match" when they match the corresponding value in the array. 在随后的行上,当它们与数组中的相应值匹配时,添加“ match”。 The 1
at the end is a common shorthand causing each line to be printed. 末尾的1
是常见的缩写,导致每行都要打印。 Setting the output field separator OFS
to a tab character preserves the format of the data. 将输出字段分隔符OFS
设置为制表符可保留数据格式。
Pedantically, with GNU Awk 4.1.1
: 用GNU Awk 4.1.1
:
awk -f so.awk so.txt
header1 header2 header3
1 1 B
3 1* A
2 A B*
1* B 1
with so.awk
: 与so.awk
:
{
if(1 == NR) {
print $0;
} else if(2 == NR) {
for(i = 1; i <= NF; i++) {
answers[i]=$i;
}
print $0;
} else {
for(i = 1; i <= NF; i++) {
field = $i;
if(answers[i]==$i) {
field = field "*" # a match
}
printf("%s\t",field);
}
printf("%s", RS);
}
}
and so.txt
as a tab delimited data file: 和so.txt
作为制表符分隔的数据文件:
header1 header2 header3
1 1 B
3 1 A
2 A B
1 B 1
This isn't homework, right...? 这不是功课吧?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.