简体   繁体   English

我如何比较两个在UNIX中具有多个字段的文本文件

[英]how can i compare two text files which has multiple fields in unix

i have two text files 我有两个文本文件

  • file 1 文件1

     number,name,account id,vv,sfee,dac acc,TDID 7000,john,2,0,0,1,6 7001,elen,2,0,0,1,7 7002,sami,2,0,0,1,6 7003,mike,1,0,0,2,1 8001,nike,1,2,4,1,8 8002,paul,2,0,0,2,7 
  • file 2 文件2

     number,account id,dac acc,TDID 7000,2,1,6 7001,2,1,7 7002,2,1,6 7003,1,2,1 

i want to compare those two text files. 我想比较这两个文本文件。 if the four columns of file 2 is there in file 1 and equal means i want output like this 如果文件2的四个列在文件1中并且相等,则意味着我要这样输出

7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1

nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' file2.txt file1.txt .. this works good for comparing two single column in two files. nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' file2.txt file1.txt ..这对于比较两个文件中的两个单列非常nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' file2.txt file1.txt i want to compare multiple column. 我想比较多列。 any one have suggestion? 有人有建议吗?


EDIT: From the OP's comments: 编辑:从OP的评论:

nawk -F"," 'NR==FNR {a[$1];next} ($1 in a)' file2.txt file1.txt

.. this works good for comparing two single column in two files. ..这对于比较两个文件中的两个单列效果很好。 i want to compare multiple column. 我想比较多列。 you have any suggestion? 你有什么建议吗?

This awk one-liner works for multi-column on unsorted files: 这个awk单行代码适用于未排序文件的多列:

awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' file1.txt file2.txt

In order for this to work, it is imperative that the first file used for input (file1.txt in my example) be the file that only has 4 fields like so: 为了使它起作用,必须将用于输入的第一个文件(在我的示例中为file1.txt)设为只有4个字段的文件,如下所示:

file1.txt file1.txt

7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1

file2.txt file2.txt

7000,john,2,0,0,1,6
7000,john,2,0,0,1,7
7000,john,2,0,0,1,8
7000,john,2,0,0,1,9
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
7003,mike,1,0,0,2,2
7003,mike,1,0,0,2,3
7003,mike,1,0,0,2,4
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7

Output 输出量

$ awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' file1.txt file2.txt
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1

Alternatively, you could also use the following syntax which more closely matches the one in your question but is not very readable IMHO 另外,您也可以使用以下语法,该语法与问题中的语法更匹配,但是恕不具体理解

awk -F, 'NR==FNR{a[$1,$2,$3,$4];next} ($1SUBSEP$3SUBSEP$6SUBSEP$7 in a)' file1.txt file2.txt

TxtSushi looks like what you want. TxtSushi看起来像您想要的。 It allows to work with CSV files using SQL. 它允许使用SQL处理CSV文件。

It's not an elegant one-liner, but you could do it with perl. 这不是一个优雅的单行代码,但是您可以使用perl做到这一点。

#!/usr/bin/perl
open A, $ARGV[0];
while(split/,/,<A>) {
    $k{$_[0]} = [@_];
}
close A;

open B, $ARGV[1];
while(split/,/,<B>) {
    print join(',',@{$k{$_[0]}}) if
        defined($k{$_[0]}) &&
        $k{$_[0]}->[2] == $_[1] &&
        $k{$_[0]}->[5] == $_[2] &&
        $k{$_[0]}->[6] == $_[3];
}
close B;

快速解答:使用cut拆分所需的字段,然后使用diff比较结果。

This is neither efficient nor pretty it will however get the job done. 这既不高效也不漂亮,但是可以完成工作。 It is not the most efficient implementation as it parses file1 multiple times however it does not read the entire file into RAM either so has some benefits over the simple scripting approaches. 这不是最有效的实现,因为它多次解析file1,但是它也不会将整个文件读入RAM,因此与简单的脚本方法相比,具有一些好处。

sed -n '2,$p' file1 | awk -F, '{print $1 "," $3 "," $6 "," $7 " " $0 }' | \
sort | join file2 - |awk '{print $2}'

This works as follows 其工作方式如下

  1. sed -n '2,$p' file1 sends file1 to STDOUT without the header line sed -n '2,$p' file1将文件1发送到STDOUT而没有标题行
  2. The first awk command prints the 4 "key fields" from file1 in the same format as they are in file2 followed by a space followed by the contents of file1 第一个awk命令以与file2中相同的格式从file1打印4个“关键字段”,后跟一个空格,后跟file1的内容
  3. The sort command ensures that file1 is in the same order as file2 sort命令确保file1与file2的顺序相同
  4. The join command joins file2 and STDOUT only writing records that have a matching record in file2 join命令仅将file2和STDOUT连接起来,写入在file2中具有匹配记录的记录
  5. The final awk command prints just the original part of file1 最后的awk命令仅打印file1的原始部分

In order for this to work you must ensure that file2 is sorted before running the command. 为了使它起作用,必须在运行命令之前确保对file2进行排序。

Running this against your example data gave the following result 对您的示例数据运行此操作可获得以下结果

7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1

EDIT 编辑

I note from your comments you are getting a sorting error. 我从您的评论中注意到您遇到了排序错误。 If this error is occuring when sorting file2 before running the pipeline command then you could split the file, sort each part and then cat them back together again. 如果在运行pipeline命令之前对file2进行排序时发生此错误,则可以拆分文件,对每个部分进行排序,然后将它们重新组合在一起。

Something like this would do that for you 这样的事情会为你做

mv file2 file2.orig
for i in 0 1 2 3 4 5 6 7 8 9
do
  grep "^${i}" file2.orig |sort > file2.$i
done
cat file2.[0-9] >file2
rm file2.[0-9] file2.orig

You may need to modify the variables passed to for if your file is not distributed evenly across the full range of leading digits. 如果您的文件未在整个前导数字范围内平均分配,则可能需要修改传递给的变量。

Not really well tested, but this might work: 尚未经过很好的测试,但这可能会起作用:

join -t, file1 file2 | awk -F, 'BEGIN{OFS=","} {if ($3==$8 && $6==$9 && $7==$10) print $1,$2,$3,$4,$6,$7}'

(Of course, this assumes the input files are sorted). (当然,这假定输入文件已排序)。

The statistical package R handles processing multiple csv tables really easily. 统计数据包R可以非常轻松地处理多个csv表。 See An Intro. 请参阅简介。 to R or R for Beginners . 对于RR对于初学者

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Unix中比较两个zip格式(.tar,.gz,.Z)文件 - How can I compare two zip format(.tar,.gz,.Z) files in Unix 如何根据列比较 unix 中的两个文件 - How do I compare two files in unix based on their columns 在 Unix 上连接文本文件中的多个字段 - Joining multiple fields in text files on Unix 如何在忽略从一个实例更改为下一个实例的字段的同时比较两个文本文件? - How do I compare two text files while ignoring fields that change from one instance to the next? 比较两个文本文件,如果第二个文件的行包含第一个文件的两列,则删除该行 - Compare two text files and if the second file has a row which contains both the columns of first file delete that row 如何比较两个文件,一个有不同的字段,另一个是 ac 文件,使用 shell 脚本? - How to compare two files, one has different fields and other is a c file, using shell script? 如何在unix / Linux中比较和替换整行以获取多个文件 - how to compare and replace a whole line in unix/Linux for multiple files 如何在 Linux/unix 中仅复制某些用户拥有的文件 - How can i copy only files which are owned by some user in Linux/unix 我如何才能将仅在同一天生成的文件连接到linux / unix目录中? - how can i concatenate files in linux/unix directory which are generated only on the same day ? 如何使用BASH比较两个文本文件的相同确切文本? - How to compare two text files for the same exact text using BASH?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM