[英]How to merge two files based on column as a key
How to merge two files based on column as a key, match to $1 column from a.txt and append b.txt based on $3 column 如何合并两个基于列的文件作为键,如何从a.txt匹配$ 1列并基于$ 3列追加b.txt
a.txt
aa; 2.5; 0.001;
ab; 1.5; 0.003;
ac; 0.4; 0.002;
b.txt
20-Nov-2014; 1775.00; aa;
20-Nov-2014; 1775.00; aa;
20-Nov-2014; 1463.40; ab;
20-Nov-2014; 1463.40; ac;
20-Nov-2014; 1463.40; ab;
Desired output look like this
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
Thanks
$ awk -F';' 'FNR==NR{a[$1]=$0;next;} {print $0" " a[substr($3,2)];}' a.txt b.txt
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
awk
implicitly loops over every line in the files. awk
隐式循环遍历文件中的每一行。 Each line is divided into fields. 每行均分为多个字段。
-F';'
This tells awk
to use the semicolon as the field separator. 这告诉awk
使用分号作为字段分隔符。
FNR==NR{a[$1]=$0;next;}
NR is the number of lines that have been read in so far and FNR is the number of lines that have been read in so far from the current file. NR是到目前为止已读取的行数,而FNR是到目前为止已从当前文件读取的行数。 Consequently, when FNR==NR
, we are still reading the first file, a.txt
. 因此,当FNR==NR
,我们仍在读取第一个文件a.txt
。 In that case, this sets assigns the whole line that was just read in, $0
, to array a
under the key $1
, the third field. 在这种情况下,此集合将刚读入的整行$0
分配给键$1
的第三个字段数组a
。
next
tells awk
to skip the rest of the commands below and jump to the next line and start over. next
告诉awk
跳过下面的其余命令,并跳到下一行并重新开始。
print $0" " a[substr($3,2)]
If we get here, that means we are working on the second file b.txt
. 如果到达此处,则意味着我们正在处理第二个文件b.txt
。 In that case, print each line of this file followed by the line from array a
with the key matching the third field. 在这种情况下,请打印此文件的每一行,然后打印数组a
的行, a
与第三字段匹配。
In file b.txt
, the third field starts with a space. 在文件b.txt
,第三个字段以空格开头。 When looking up this field in array a
, that space is removed with the substr
function. 在数组a
查找此字段时,该空间将通过substr
函数删除。
awk -F\; 'NR==FNR{arr[" "$1]=$0;next} {print $0, arr[$3]}' a b
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
$ awk -F'; ?' 'NR==FNR{a[$1]=$0;next} {print $0, a[$3]}' a.txt b.txt
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.