简体   繁体   English

awk,根据不同文件的行匹配方式打印某些列

[英]Awk, printing certain columns based on how rows of different files match

I am pretty sure that it is awk I would have to use I have one file with information I need and another file where I need to take two pieces of information from and obtain two numbers from the second file based on that piece of information.我很确定这是 awk 我必须使用我有一个包含我需要的信息的文件和另一个文件,我需要从中获取两条信息并根据该条信息从第二个文件中获取两个数字。 So if the first file has m7 in its fifth column and 3 in it's third column I want to search in the second column for a row that has 3 in it's first column and m7 in it's fourth column.因此,如果第一个文件的第五列有 m7,第三列有 3,我想在第二列中搜索第一列有 3 且第四列有 m7 的行。 The I want to print certain columns from these files as listed below.我想从这些文件中打印某些列,如下所列。

Given the following two files of input file1鉴于输入file1的以下两个文件

1 dog   3   8   m7  n15 
50 cat  5   8   m15 m22
20 fish 6   3   n12 m7  

file2文件 2

3   695 842 m7  word
5   847 881 m15 not
8    910 920 n15 important
8   695 842 m22 word
6   312 430 n12 not

I want to produce the output我想产生输出

pre3   695   842   21
pre5   847   881   50
pre6   312   430   20
pre8   910   920   1
pre8   695   842   50

EDIT:编辑:

I need to also produce output of the form我还需要生成表单的输出

pre3   695   842   pre8   910   920   1
pre5   847   881   pre8   695   842   50
pre6   312   430   pre3   695   842   20

The answer below work for the question before, but I'm confused with some of the syntax of it so I'm not sure how to adjust it to make this output下面的答案适用于之前的问题,但我对它的一些语法感到困惑,所以我不确定如何调整它以生成此输出

This command:这个命令:

awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
     NR>FNR && ar[$4,$1] {print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2

outputs pre plus the content of the second file's first, second, and third column and the first file's first column for all lines in which the content of the first file's fifth and third (or sixth and fourth) column is identical to the second file's fourth and first column:对于第一个文件的第五和第三(或第六和第四)列的内容与第二个文件的第四列的内容相同的所有行,输出pre加上第二个文件的第一、第二和第三列的内容以及第一个文件的第一列和第一列:

pre3 695 842 21
pre5 847 881 50
pre8 910 920 1
pre8 695 842 50
pre6 312 430 20

(for lines with more than one match the values of ar[$4,$1] are summed up) (对于具有多个匹配项的行,总结了 ar[$4,$1] 的值)

Note that the output is not necessarily sorted!请注意,输出不一定要排序! To achieve this: add sort :要实现这一点:添加sort

awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
     NR>FNR && ar[$4,$1]{print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2 | sort

What does the code?代码是做什么的?

  • NR==FNR{...} works on the first input file only NR==FNR{...}适用于第一个输入文件
  • NR>FNR{...} works on the 2nd, 3rd,... input file NR>FNR{...}适用于第 2、3、... 输入文件
  • ar[$5,$3] creates an array whose key is the content of the 5th and 3rd column of the current line / record (separated by the field separator; usually a single blank) ar[$5,$3]创建一个数组,其键是当前行/记录的第 5 和第 3 列的内容(由字段分隔符分隔;通常是单个空格)

You could use the below command :您可以使用以下命令:

awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4]' f1.txt f2.txt

If you want to print only the specific fields from the matching lines in second file use like below :如果您只想打印第二个文件中匹配行中的特定字段,请使用如下所示:

awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4] { print "pre"$1" "$2" "$3}' f1.txt f2.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM