简体   繁体   English

Linux:比较两个文件但不关心仅一行内容

[英]Linux: Comparing two files but not caring what line only content

I am trying to use comm or diff Linux commands to compare to different files. 我正在尝试使用comm或diff Linux命令来比较不同的文件。 Each file has a list of volume names. 每个文件都有一个卷名列表。 File A has 1500 volumes and file B has those same 1500 volumes plus another 200 with a total of 1700. I am looking for away to just find those 200 volumes. 文件A具有1500个卷,文件B具有相同的1500个卷,外加200个,共1700个卷。我正在寻找仅能找到这200个卷的地方。 I dont care if the volumes match and are on different lines, I only want the mismatched volumes but the diff and comm command seem to only compare line by line. 我不在乎卷是否匹配并且在不同的行上,我只希望不匹配的卷,但是diff和comm命令似乎只逐行比较。 Does anyone know another command or a way to use the comm or diff command to find these 200 volumes? 有谁知道另一个命令或使用comm或diff命令查找这200个卷的方法?

First 5 lines of both files: (BTW there is only one volume on each line so File A has 1500 lines and File B has 1700 lines) 两个文件的前5行:(顺便说一句,每行只有一个卷,因此文件A有1500行,文件B有1700行)

File A: 档案A:

B00004
B00007
B00010
B00011
B00013

File B: 档案B:

B00003   
B00004   
B00007    
B00008    
B00010 

So I would want the command to show me B00003 and B00008 just from the first 5 lines because those volumes are not in File A 因此,我希望命令仅从前5行向我显示B00003和B00008,因为这些卷不在文件A中

Try 尝试

comm -23 <( sort largerFile) <(sort smallerFile) 

This assumes that your Vol name will be the first "field" in the data. 假设您的Vol名称将是数据中的第一个“字段”。 If not, check man sort for ways to sort files on alternate fields (and combinations of fields). 如果不是,请检查man sort ,以找到在备用字段(和字段组合)上对文件进行man sort的方法。

The <( ....) construct is known as process substitution. <( ....)构造称为过程替换。 If you're using a really old shell/unix or reduced functionality shell (dash?), process substitution may not be available. 如果您使用的是非常老的shell / unix或功能简化的shell(破折号?),则可能无法使用进程替换。 Then you'll have to sort your files before you run comm and manage what you do with the unsorted file. 然后,您必须先对文件排序,然后再运行comm并管理对未排序文件的操作。

Note that as comm -23 means "suppress output from 2nd file" (- 2 ) and "suppress output from the two files in common" ( -3 ), the remaining output is differences found in file1 that are not in file2. 请注意,由于comm -23表示“禁止来自第二个文件的输出”( 2 )和“禁止两个共同的文件的输出”( -3 ),因此其余输出是在file1中找到的差异,而在file2中则没有。 This is why I list largerFile first. 这就是为什么我首先列出largerFile原因。

IHTH 高温超导

awk也可以提供帮助。

 awk  'NR==FNR {a[$1]=$1; next}!($1 in a) {print $0}' fileA fileB

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM