Perl 打印两个文件的差异

Question

我试图区分两个文件和 output 的差异。 我下面的代码适用于文件 1 中存在但文件 2 中缺少的项目，但不适用于文件 1 中缺少的文件 2 中的项目。 尝试交换 file1 和 file2 但不起作用。 提前致谢。

use warnings;
use strict;
my $file1 = '1.txt';
my $file2 = '2.txt';



open my $fh, '<', $file2 or die $!;
my $file = [<$fh>];
open $fh, '<', $file1 or die $!;
while(my $line = <$fh>) {
    chomp($line);
#print "$line\n";

    my $status = 0;
    for (@{$file}) {
        chomp;
        if (/$line/) {
            $status = 1;
            last;
        }
    }
    print $line, $/ if $status == 0 
}

文件1：

文件2：

Answer 1

您的代码有几个问题。

检查文件后，我看到 file2 有一些尾随空格。 由于 file1 没有它们，因此您永远无法在第一个没有空格的文件上匹配'1103 ' 。

chomp只删除最后一个新行（如果存在）所以这对尾随空格没有帮助。

而不是 chomp 我会使用正则表达式来删除行尾的任何“spacy”字符。 为此，您可以使用s/\s*$// 。

此外，您正在使用正则表达式比较行。 除非使用某些单词边界，否则这可能会出现问题。 因为如果不这样做，您将在第一个文件上比较1 ，这将与第二个文件上的123匹配，这是不正确的。

我会使用eq来比较两条线。

因此，这是带有更改的脚本：

use warnings;
use strict;
my $file1 = '2.txt';  # Exchanged files to test the non-working case
my $file2 = '1.txt';

open my $fh, '<', $file2 or die $!;
my $file = [<$fh>];
open $fh, '<', $file1 or die $!;
while(my $line = <$fh>) {
    $line =~ s/\s+$//;    # changed to remove all space-like trailing characters

    my $status = 0;
    for (@{$file}) {
        s/\s+$//;    # changed to remove all space-like trailing characters
        if ($_ eq $line) {    # changed to use a regular comparison
            $status = 1;
            last;
        }
    }
    print $line, $/ if $status == 0 
}

额外提示：

您实际上不需要使用数组引用处理 file1 。 您可以简单地使用数组。 这样你就可以避免对for循环的取消引用：

所以你可以改变这些行：

...
my @file_content = <$fh>;
...
for (@file_content) {
...

还有一个提示：

对于大文件，代码可能太慢，因为算法的成本是O(n^2)

可能您可能想要使用此处描述的一种技术。

Answer 2

根据我的理解，它不是逐行匹配，而是逐个文件进行数字比较。

1) Open the files

2) store the contents in the multiple arrays

3) Simply compare the two arrays.

   use Array::Utils qw(:all);

   my @file_arr1 = qw(15122 16070 61 15106 16704 15105 7303 15201 21 16712 7308 16029 16008 16023 16025 16044 16045 16042 16043 16040 16041 16226 15112 16914 16915 31 16910 16911 16912 16913 16114 7505 1103 16018 16916);

   my @file_arr2 = qw(1103 15105 15106 15112 15201 15211 16024 16029 16044 16051 16070 16201 16225 16350 21 31 61 7303 7505);

   my @unmatched_arr = array_diff(@file_arr1, @file_arr2);

   my @matched_arr = unique(@file_arr1, @file_arr2);

   print join "\n", @unmatched_arr;

谢谢。

Answer 3

使用 List::Compare 模块https://metacpan.org/pod/List::Compare解决上述要求的另一种方法

脚本

use strict;
use warnings;

use File::Grep qw( fmap );
use String::Util qw(trim);
use List::Compare;
use Data::Dumper;

my $file_1 = "file1.txt";
my $file_2 = "file2.txt";

#fmap BLOCK LIST : Performs a map operation on the files in LIST, 
#using BLOCK as the mapping function. The results from BLOCK will be 
#appended to the list that is returned at the end of the call.
# trim : Returns the string with all leading and trailing whitespace removed.
my @data1= fmap { trim($_)  } $file_1;
my @data2= fmap { trim($_)  } $file_2;

#Create a List::Compare object. Put the two lists into arrays (named or anonymous) 
# and pass references to the arrays to the constructor.
my $diff_file1 = List::Compare->new(\@data1, \@data2);
#get_unique() : Get those items which appear (at least once) only in the first list.
my @data_missing_file2 = $diff_file1->get_unique;

my $diff_file2 = List::Compare->new(\@data2, \@data1);
my @data_missing_file1 = $diff_file2->get_unique;

print "Data missing in file2 which present in file1 : ",  Dumper(\@data_missing_file2) , "\n";
print "Data missing in file1 which present in file2: ", Dumper(\@data_missing_file1) , "\n";

Output

Data missing in file2 which present in file1 : $VAR1 = [
          '15122',
          '16008',
          '16018',
          '16023',
          '16025',
          '16040',
          '16041',
          '16042',
          '16043',
          '16045',
          '16114',
          '16226',
          '16704',
          '16712',
          '16910',
          '16911',
          '16912',
          '16913',
          '16914',
          '16915',
          '16916',
          '7308'
        ];

Data missing in file1 which present in file2: $VAR1 = [
          '15211',
          '16024',
          '16051',
          '16201',
          '16225',
          '16350'
        ];

Perl 打印两个文件的差异

问题描述

3 个解决方案

解决方案1
2 2020-08-16 10:42:45

解决方案2
1 2020-08-16 09:47:04

解决方案3
0 2020-08-20 13:53:57

Perl 打印两个文件的差异

问题描述

3 个解决方案

解决方案1 2 2020-08-16 10:42:45

解决方案2 1 2020-08-16 09:47:04

解决方案3 0 2020-08-20 13:53:57

解决方案1
2 2020-08-16 10:42:45

解决方案2
1 2020-08-16 09:47:04

解决方案3
0 2020-08-20 13:53:57