简体   繁体   English

如何基于列为键合并两个文件

[英]How to merge two files based on column as a key

How to merge two files based on column as a key, match to $1 column from a.txt and append b.txt based on $3 column 如何合并两个基于列的文件作为键,如何从a.txt匹配$ 1列并基于$ 3列追加b.txt

a.txt
aa; 2.5; 0.001;
ab; 1.5; 0.003;
ac; 0.4; 0.002;

b.txt

20-Nov-2014; 1775.00; aa;
20-Nov-2014; 1775.00; aa;
20-Nov-2014; 1463.40; ab;
20-Nov-2014; 1463.40; ac;
20-Nov-2014; 1463.40; ab;

Desired output look like this 
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;

Thanks
$ awk -F';' 'FNR==NR{a[$1]=$0;next;} {print $0" " a[substr($3,2)];}' a.txt b.txt
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;

How it works 这个怎么运作

awk implicitly loops over every line in the files. awk隐式循环遍历文件中的每一行。 Each line is divided into fields. 每行均分为多个字段。

  • -F';'

    This tells awk to use the semicolon as the field separator. 这告诉awk使用分号作为字段分隔符。

  • FNR==NR{a[$1]=$0;next;}

    NR is the number of lines that have been read in so far and FNR is the number of lines that have been read in so far from the current file. NR是到目前为止已读取的行数,而FNR是到目前为止已从当前文件读取的行数。 Consequently, when FNR==NR , we are still reading the first file, a.txt . 因此,当FNR==NR ,我们仍在读取第一个文件a.txt In that case, this sets assigns the whole line that was just read in, $0 , to array a under the key $1 , the third field. 在这种情况下,此集合将刚读入的整行$0分配给键$1的第三个字段数组a

    next tells awk to skip the rest of the commands below and jump to the next line and start over. next告诉awk跳过下面的其余命令,并跳到下一行并重新开始。

  • print $0" " a[substr($3,2)]

    If we get here, that means we are working on the second file b.txt . 如果到达此处,则意味着我们正在处理第二个文件b.txt In that case, print each line of this file followed by the line from array a with the key matching the third field. 在这种情况下,请打印此文件的每一行,然后打印数组a的行, a与第三字段匹配。

    In file b.txt , the third field starts with a space. 在文件b.txt ,第三个字段以空格开头。 When looking up this field in array a , that space is removed with the substr function. 在数组a查找此字段时,该空间将通过substr函数删除。

awk -F\; 'NR==FNR{arr[" "$1]=$0;next} {print $0, arr[$3]}'  a b
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
$ awk -F'; ?' 'NR==FNR{a[$1]=$0;next} {print $0, a[$3]}' a.txt b.txt
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM