简体   繁体   English

在Linux中逐字段比较两个文件

[英]Comparing two files field by field in Linux

I am trying to compare two files (separted by comma and space) using 3 fields (field 1,2, 5 from file1 and 1,2,5 from file2) if the two files match i want the whole record of file2 concatenated with the last filed of file1 using awk. 我试图比较两个文件(用逗号和空格分隔)使用3个字段(字段1,2,5从file1和1,2,5从file2)如果两个文件匹配我想要file2的整个记录​​连接到最后使用awk提交的file1。 for example file1: 例如file1:

1, 4, abebe, kebede, 25, 101, 42
1, 4, abebe, debebe, 42, 201, 47
1, 4, abebech, kebede, 17, 33, 57

file2: 文件2:

1, 4, abebe, kebede, 25, 101, 42
1, 4, Tesse, debo, 25, 101, 42
1, 4, derartu, tulu, 25, 101, 42

output: 输出:

42, 1, 4, abebe, kebede, 25, 101, 42
47, 1, 4, Tesse, debo, 25, 101, 42
57, 1, 4, derartu, tulu, 25, 101, 42

I am new for linux.... any help is apprciated 我是linux的新手....任何帮助都是适用的

My first reading of the problem lends itself to this solution: 我对问题的第一次阅读有助于这个解决方案:

awk '{getline t < "file2"; split( t, a );
    if( a[1]a[2]a[5] == $1$2$5) print $NF",", t}' file1

But it appears that the question is actually: 'Given file1 in which we know that any record in which fields 1, 2, and 4 are the same the final field is also the same, find all lines in file2 with corresponding fields 1, 2, and 4 and output that line with the final field from file1 prepended. 但似乎问题实际上是:'给定file1 ,其中我们知道任何记录,其中字段1,2和4是相同的,最后的字段也是相同的,找到file2所有行与相应的字段1,2和4,并输出该行与file1的最后一个字段前置。 In which case the solution given by Dennis works. 在这种情况下,丹尼斯给出的解决方案是有效的。

Since fields 1, 2 and 5 of record 1 in file 1 match all the records in file 2 I have listed the files as arguments in the opposite order in order to get the output you want. 由于文件1中记录1的字段1,2和5与文件2中的所有记录匹配,因此我以相反的顺序将文件列为参数,以获得所需的输出。

awk 'BEGIN {OFS = ", "} NR == FNR {a[$1, $2, $5] = $NF; next} $1 SUBSEP $2 SUBSEP $5 in a {print a[$1, $2, $5], $0}' file2 file1

The NR == FNR block forms a loop that reads the file which appears first in the argument list into an array. NR == FNR块形成一个循环,该循环将参数列表中首先出现的文件读入数组。 When the record number ( NR ) and the file record number ( FNR ) are no longer equal, processing continues to the file which is named as the second argument. 当记录号( NR )和文件记录号( FNR )不再相等时,处理继续到命名为第二个参数的文件。

There, the array is checked to see if the fields from the two files match. 在那里,检查数组以查看两个文件中的字段是否匹配。 If so, the corresponding saved field and the current record are output. 如果是,则输出相应的保存字段和当前记录。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM