简体   繁体   中英

comparing multiple files and columns using awk

I have two files and I would like to match column 2 and 3 from file1 with column 2 and 3 from file3 . If the pattern is found, I would like to output the whole line from file2 with, in addition column 1 from file1 at the end:

I have the following two file-types: ( file2 has a lot of columns ( tab seperated) but, columns 2 and 3 can match 2 and three from file1 . )

file1

name1 1 12343442 
name2 2 32434242
name3 3 982793749

file2

a 1 12343442 text1  text2  text3 value0 value2 
a 1 12343442 text1  text2  text3 value2 value3 
a 1 12348888 text1  text2  text3 value0 value2   
b 3 982793749 text1  text4  text3 value1 value11
b 2 982793749 text1  text4  text3 value1 value11

desired output

a 1 12343442 text1  text2  text3 value0 value2 name1
a 1 12343442 text1  text2  text3 value2 value3 name1
b 3 982793749 text1  text4  text3 value1 value11 name3

I have tried doing this using awk . Something like:

awk 'BEGIN { FS = "\t" } NR==FNR { a[$1]=$2 FS $3; next} ('$2 FS $3' in a) {print $0, a[$1]}' file1 file2

But it doesnt work. Even if I just try to match the third columns it does not work. The files are pretty big >500mb so I would like to read them only once. Any ideas? Thank you!

this one-liner should work :

awk -F'\t' -v OFS='\t' 'NR==FNR{a[$2FS$3]=$1;next}$2FS$3 in a{print $0,a[$2FS$3]}' file1 file2

in your codes

  • you had a[$1]=$2 FS $3;next , you were confused by the key and value . here you wanted the $2FS$3 to be key, and $1 to be the value.
  • ('$2 FS $3' in a) is not correct either, remove the single-quotes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM