[英]Awk, printing certain columns based on how rows of different files match
I am pretty sure that it is awk I would have to use I have one file with information I need and another file where I need to take two pieces of information from and obtain two numbers from the second file based on that piece of information.我很确定这是 awk 我必须使用我有一个包含我需要的信息的文件和另一个文件,我需要从中获取两条信息并根据该条信息从第二个文件中获取两个数字。 So if the first file has m7 in its fifth column and 3 in it's third column I want to search in the second column for a row that has 3 in it's first column and m7 in it's fourth column.因此,如果第一个文件的第五列有 m7,第三列有 3,我想在第二列中搜索第一列有 3 且第四列有 m7 的行。 The I want to print certain columns from these files as listed below.我想从这些文件中打印某些列,如下所列。
Given the following two files of input file1鉴于输入file1的以下两个文件
1 dog 3 8 m7 n15
50 cat 5 8 m15 m22
20 fish 6 3 n12 m7
file2文件 2
3 695 842 m7 word
5 847 881 m15 not
8 910 920 n15 important
8 695 842 m22 word
6 312 430 n12 not
I want to produce the output我想产生输出
pre3 695 842 21
pre5 847 881 50
pre6 312 430 20
pre8 910 920 1
pre8 695 842 50
EDIT:编辑:
I need to also produce output of the form我还需要生成表单的输出
pre3 695 842 pre8 910 920 1
pre5 847 881 pre8 695 842 50
pre6 312 430 pre3 695 842 20
The answer below work for the question before, but I'm confused with some of the syntax of it so I'm not sure how to adjust it to make this output下面的答案适用于之前的问题,但我对它的一些语法感到困惑,所以我不确定如何调整它以生成此输出
This command:这个命令:
awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
NR>FNR && ar[$4,$1] {print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2
outputs pre
plus the content of the second file's first, second, and third column and the first file's first column for all lines in which the content of the first file's fifth and third (or sixth and fourth) column is identical to the second file's fourth and first column:对于第一个文件的第五和第三(或第六和第四)列的内容与第二个文件的第四列的内容相同的所有行,输出pre
加上第二个文件的第一、第二和第三列的内容以及第一个文件的第一列和第一列:
pre3 695 842 21
pre5 847 881 50
pre8 910 920 1
pre8 695 842 50
pre6 312 430 20
(for lines with more than one match the values of ar[$4,$1] are summed up) (对于具有多个匹配项的行,总结了 ar[$4,$1] 的值)
Note that the output is not necessarily sorted!请注意,输出不一定要排序! To achieve this: add sort
:要实现这一点:添加sort
:
awk 'NR==FNR{ar[$5,$3]=$1+ar[$5,$3]; ar[$6,$4]=$1+ar[$6,$4]}
NR>FNR && ar[$4,$1]{print "pre"$1,$2,$3,ar[$4,$1]}' file1 file2 | sort
What does the code?代码是做什么的?
NR==FNR{...}
works on the first input file only NR==FNR{...}
适用于第一个输入文件NR>FNR{...}
works on the 2nd, 3rd,... input file NR>FNR{...}
适用于第 2、3、... 输入文件ar[$5,$3]
creates an array whose key is the content of the 5th and 3rd column of the current line / record (separated by the field separator; usually a single blank) ar[$5,$3]
创建一个数组,其键是当前行/记录的第 5 和第 3 列的内容(由字段分隔符分隔;通常是单个空格)You could use the below command :您可以使用以下命令:
awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4]' f1.txt f2.txt
If you want to print only the specific fields from the matching lines in second file use like below :如果您只想打印第二个文件中匹配行中的特定字段,请使用如下所示:
awk 'NR==FNR {a[$3 FS $5]=1;next } a[$1 FS $4] { print "pre"$1" "$2" "$3}' f1.txt f2.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.