简体   繁体   中英

How to merge two files based on column as a key

How to merge two files based on column as a key, match to $1 column from a.txt and append b.txt based on $3 column

a.txt
aa; 2.5; 0.001;
ab; 1.5; 0.003;
ac; 0.4; 0.002;

b.txt

20-Nov-2014; 1775.00; aa;
20-Nov-2014; 1775.00; aa;
20-Nov-2014; 1463.40; ab;
20-Nov-2014; 1463.40; ac;
20-Nov-2014; 1463.40; ab;

Desired output look like this 
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;

Thanks
$ awk -F';' 'FNR==NR{a[$1]=$0;next;} {print $0" " a[substr($3,2)];}' a.txt b.txt
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;

How it works

awk implicitly loops over every line in the files. Each line is divided into fields.

  • -F';'

    This tells awk to use the semicolon as the field separator.

  • FNR==NR{a[$1]=$0;next;}

    NR is the number of lines that have been read in so far and FNR is the number of lines that have been read in so far from the current file. Consequently, when FNR==NR , we are still reading the first file, a.txt . In that case, this sets assigns the whole line that was just read in, $0 , to array a under the key $1 , the third field.

    next tells awk to skip the rest of the commands below and jump to the next line and start over.

  • print $0" " a[substr($3,2)]

    If we get here, that means we are working on the second file b.txt . In that case, print each line of this file followed by the line from array a with the key matching the third field.

    In file b.txt , the third field starts with a space. When looking up this field in array a , that space is removed with the substr function.

awk -F\; 'NR==FNR{arr[" "$1]=$0;next} {print $0, arr[$3]}'  a b
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
$ awk -F'; ?' 'NR==FNR{a[$1]=$0;next} {print $0, a[$3]}' a.txt b.txt
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM