如何通过比较linux中的两个不同文件来提取一些丢失的行？

Question

I have two diferrent files which some rows are missing in one of the files. 我有两个不同的文件，其中一个文件中缺少某些行。 I want to make a new file including those non-common rows between two files. 我想创建一个新文件，包括两个文件之间的非公共行。 as and example, I have following files: 例如，我有以下文件：

file1: 文件1：

id1 
id22 
id3 
id4 
id43 
id100 
id433

file2: 文件2：

id1
id2
id22
id3
id4
id8
id43
id100
id433
id21

I want to extract those rows which exist in file2 but do not in file1: 我想提取存在于file2但不在file1中的那些行：

new file: 新文件：

 id2
 id8 
 id21

any suggestion please? 有什么建议吗？

Answer 1

Use the comm utility (assumes bash as the shell): 使用comm实用程序（假设bash为shell）：

comm -13 <(sort file1) <(sort file2)

Note how the input must be sorted for this to work, so your delta will be sorted, too. 请注意必须如何对输入进行排序才能生效，因此您的delta也将进行排序。

comm uses an (interleaved) 3-column layout: comm使用（交错）3列布局：

column 1: lines only in file1 第1列：仅在file1中的行
column 2: lines only in file2 第2列：仅在file2中的行
column 2: lines in both files 第2列：两个文件中的行

-13 suppresses columns 1 and 2, which prints only the values exclusive to file2 . -13禁止列1和2，它只打印file2独有的值。

Caveat : For lines to be recognized as common to both files they must match exactly - seemingly identical lines that differ in terms of whitespace (as is the case in the sample data in the question as of this writing, where file1 lines have a trailing space ) will not match. 警告： 对于要识别为两个文件共同的行，它们必须完全匹配 - 看似相同的行在空格方面不同（如本文所述，问题中的示例数据中的情况就是这样，其中file1行具有尾随空格 ）不会匹配。

cat -et is a command that visualizes line endings and control characters, which is helpful in diagnosing such problems. cat -et是一个可视化行结尾和控制字符的命令，有助于诊断此类问题。

For instance, cat -et file1 would output lines such as id1 $ , making it obvious that there's a trailing space at the end of the line (represented as $ ). 例如， cat -et file1将输出诸如id1 $ ，这显然在行的末尾有一个尾随空格（表示为$ ）。

If instead of cleaning up file1 you want to compare the files as-is, try: 如果不是清理file1而是想按原样比较文件，请尝试：

comm -13 <(sed -E 's/ +$//' file1 | sort) <(sort file2)

A generalized solution that trims leading and trailing whitespace from the lines of both files: 一种通用的解决方案，可以从两个文件的行中修剪前导和尾随空格：

comm -13 <(sed -E 's/^[[:blank:]]+|[[:blank:]]+$//g' file1 | sort) \
         <(sed -E 's/^[[:blank:]]+|[[:blank:]]+$//g' file2 | sort)

^{Note: The above sed commands require either GNU or BSD sed .} ^{注意：上面的sed命令需要GNU或BSD sed 。}

Answer 2

您可以尝试对两个文件进行排序，然后计算重复的行，并仅选择计数为1的那些行

sort file1 file2 | uniq -c | awk '$1 == 1 {print $2}'

如何通过比较linux中的两个不同文件来提取一些丢失的行？

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-10-16 19:22:19

解决方案2
1 2015-10-16 20:12:29

如何通过比较linux中的两个不同文件来提取一些丢失的行？

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-10-16 19:22:19

解决方案2 1 2015-10-16 20:12:29

解决方案1
2 已采纳 2015-10-16 19:22:19

解决方案2
1 2015-10-16 20:12:29