简体   繁体   中英

Merge multiple input files with awk

I am trying to merge the contents of multiple files based on a key matching with awk, I have seen solutions only for two input files, but not more. The input files look like this:

file1

1#a1
2#b1
3#c1
4#d1
6#f1

file2

1#a2
2#b2
3#c2
5#e2
6#f2

file3

1#a3#extra_field_1
2#b3#extra_field_2
3#c3#extra_field_3
4#d3#extra_field_4
5#e3#extra_field_5

The desired output is the following:

output

a1;a2;a3;extra_field_1
b1;b2;b3;extra_field_2
c1;c2;c3;extra_field_3
d1;;d3;extra_field_4
;e2;3e;extra_field_5

For this, I am using a bash script based on awk command like the following:

$ awk -v OFS=';' -F '#' 'FNR==NR{a[$1]=$2;next} FNR!=NR{b[$1]=$2;next} NF==3{print a[$1],b[$1],$2,$3}' file1 file2 file3 > output

Anyway, it seems to obviate some of the inputs because it doesn't produce any output, any ideas?

Thanks.

You could do that using just the join command

join -t\# file1 file2 -j 1 |\
    join -t\# - file3 -j 1 |\
    cut -d\# --output-delimiter=\; -f2-5

Outputs

a1;a2;a3;extra_field_1
b1;b2;b3;extra_field_2
c1;c2;c3;extra_field_3

使用paste和awk的另一种方法:

paste -d"#" file1 file2 file3 | awk -F"#" '{print $2,$4,$6,$7}' OFS=";"

Too complicated to use awk with 3 files for me, so I'll offer other stuff. Using paste:

for x in $(paste -d"#" a b c); do x=${x#\#}; x=${x//\#\#/\;}; echo ${x//\#/;};done

Paste is my go to tool for merging - from there pure Bash or tr can do the job if you don't have it. There's a problem with pasting with "" as the delimiter as that causes the first column (file) to disappear. Not sure why, but that's the reason using something else - "#" here, making double ## as the delimiter as the result of paste.

Another option is to read all files line by line for pure bash, but I think that's overkill.

Here's one in awk. It doesn't take missing data into consideration as you did not state in the question how it should be handled. It hashes all data into a hash and outputs it in the END :

$ awk '
BEGIN { FS="#"; OFS=";" }
{
    for(i=2;i<=NF;i++)
        a[$1]=a[$1] (a[$1]==""?"":OFS) $i
}
END {
    for(i in a)
        print a[i]
}' f1 f2 f3
a1;a2;a3;extra_field_1
b1;b2;b3;extra_field_2
c1;c2;c3;extra_field_3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM