简体   繁体   中英

Awk, printing certain columns based on how rows of different files match

I am pretty sure that it is awk I would have to use I have one file with information I need and another file where I need to take two pieces of information from and obtain two numbers from the second file based on that piece of information. So if the first file has m7 in its fifth column and 3 in it's third column I want to search in the second column for a row that has 3 in it's first column and m7 in it's fourth column. The I want to print certain columns from these files as listed below.

Given the following two files of input file1

1 dog   3   8   m7  n15 
50 cat  5   8   m15 m22
20 fish 6   3   n12 m7  

file2

3   695 842 m7  word
5   847 881 m15 not
8    910 920 n15 important
8   695 842 m22 word
6   312 430 n12 not

I want to produce the output

pre3   695   842   21
pre5   847   881   50
pre6   312   430   20
pre8   910   920   1
pre8   695   842   50

EDIT:

I need to also produce output of the form

pre3   695   842   pre8   910   920   1
pre5   847   881   pre8   695   842   50
pre6   312   430   pre3   695   842   20

The answer below work for the question before, but I'm confused with some of the syntax of it so I'm not sure how to adjust it to make this output

This command:

awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
     NR>FNR && ar[$4,$1] {print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2

outputs pre plus the content of the second file's first, second, and third column and the first file's first column for all lines in which the content of the first file's fifth and third (or sixth and fourth) column is identical to the second file's fourth and first column:

pre3 695 842 21
pre5 847 881 50
pre8 910 920 1
pre8 695 842 50
pre6 312 430 20

(for lines with more than one match the values of ar[$4,$1] are summed up)

Note that the output is not necessarily sorted! To achieve this: add sort :

awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
     NR>FNR && ar[$4,$1]{print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2 | sort

What does the code?

  • NR==FNR{...} works on the first input file only
  • NR>FNR{...} works on the 2nd, 3rd,... input file
  • ar[$5,$3] creates an array whose key is the content of the 5th and 3rd column of the current line / record (separated by the field separator; usually a single blank)

You could use the below command :

awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4]' f1.txt f2.txt

If you want to print only the specific fields from the matching lines in second file use like below :

awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4] { print "pre"$1" "$2" "$3}' f1.txt f2.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM