简体   繁体   中英

Perl print diff of two files

I am trying to diff between two files and output the diff. My code below works for item exist in file1 but missing in file2, but not working for item in file2 missing in file1. Tried exchange file1 and file2 but not working. Thanks in advance.

use warnings;
use strict;
my $file1 = '1.txt';
my $file2 = '2.txt';



open my $fh, '<', $file2 or die $!;
my $file = [<$fh>];
open $fh, '<', $file1 or die $!;
while(my $line = <$fh>) {
    chomp($line);
#print "$line\n";

    my $status = 0;
    for (@{$file}) {
        chomp;
        if (/$line/) {
            $status = 1;
            last;
        }
    }
    print $line, $/ if $status == 0 
}

File1:

15122
16070
61
15106
16704
15105
7303
15201
21
16712
7308
16029
16008
16023
16025
16044
16045
16042
16043
16040
16041
16226
15112
16914
16915
31
16910
16911
16912
16913
16114
7505
1103
16018
16916

File2:

1103 
15105 
15106 
15112 
15201 
15211 
16024 
16029 
16044 
16051 
16070 
16201 
16225 
16350 
21 
31 
61 
7303 
7505 

You had a couple of problems on your code.

After inspecting the files, I see that file2 has some trailing spaces. Since file1 does not have them, you can never match '1103 ' on the first file, which has no spaces.

chomp only removes the last new line (if present) So that won't help with the trailing spaces.

Instead of chomp I would use a regular expression to remove any 'spacy' character at the end of the line. You may use s/\s*$// for this purpose.

Also, you are comparing the lines using a regex. That could be problematic unless using some word boundary . Because if you don't, you'll compare 1 on the first file, that will match against 123 on the second file, which is incorrect.

I would use eq to compare both lines instead.

So, this is the script with the changes:

use warnings;
use strict;
my $file1 = '2.txt';  # Exchanged files to test the non-working case
my $file2 = '1.txt';

open my $fh, '<', $file2 or die $!;
my $file = [<$fh>];
open $fh, '<', $file1 or die $!;
while(my $line = <$fh>) {
    $line =~ s/\s+$//;    # changed to remove all space-like trailing characters

    my $status = 0;
    for (@{$file}) {
        s/\s+$//;    # changed to remove all space-like trailing characters
        if ($_ eq $line) {    # changed to use a regular comparison
            $status = 1;
            last;
        }
    }
    print $line, $/ if $status == 0 
}

Extra tip:

You don't actualy need to work on file1 with an array reference. You may simply use an array. That way you'll avoid the dereferencing on the for loop:

So you could change these lines:

...
my @file_content = <$fh>;
...
for (@file_content) {
...

Yet another tip:

The code may be too slow for large files, since the cost of the algorithm is O(n^2)

Probably you may want to use one of the technics described here .

Based on my understanding its not a line by line matching instead numeric comparisons file by file.

1) Open the files

2) store the contents in the multiple arrays

3) Simply compare the two arrays.

   use Array::Utils qw(:all);

   my @file_arr1 = qw(15122 16070 61 15106 16704 15105 7303 15201 21 16712 7308 16029 16008 16023 16025 16044 16045 16042 16043 16040 16041 16226 15112 16914 16915 31 16910 16911 16912 16913 16114 7505 1103 16018 16916);

   my @file_arr2 = qw(1103 15105 15106 15112 15201 15211 16024 16029 16044 16051 16070 16201 16225 16350 21 31 61 7303 7505);

   my @unmatched_arr = array_diff(@file_arr1, @file_arr2);

   my @matched_arr = unique(@file_arr1, @file_arr2);

   print join "\n", @unmatched_arr;

Thanks.

Just another way to solve above requirement using List::Compare module https://metacpan.org/pod/List::Compare

Script

use strict;
use warnings;

use File::Grep qw( fmap );
use String::Util qw(trim);
use List::Compare;
use Data::Dumper;

my $file_1 = "file1.txt";
my $file_2 = "file2.txt";

#fmap BLOCK LIST : Performs a map operation on the files in LIST, 
#using BLOCK as the mapping function. The results from BLOCK will be 
#appended to the list that is returned at the end of the call.
# trim : Returns the string with all leading and trailing whitespace removed.
my @data1= fmap { trim($_)  } $file_1;
my @data2= fmap { trim($_)  } $file_2;

#Create a List::Compare object. Put the two lists into arrays (named or anonymous) 
# and pass references to the arrays to the constructor.
my $diff_file1 = List::Compare->new(\@data1, \@data2);
#get_unique() : Get those items which appear (at least once) only in the first list.
my @data_missing_file2 = $diff_file1->get_unique;

my $diff_file2 = List::Compare->new(\@data2, \@data1);
my @data_missing_file1 = $diff_file2->get_unique;

print "Data missing in file2 which present in file1 : ",  Dumper(\@data_missing_file2) , "\n";
print "Data missing in file1 which present in file2: ", Dumper(\@data_missing_file1) , "\n";

Output

Data missing in file2 which present in file1 : $VAR1 = [
          '15122',
          '16008',
          '16018',
          '16023',
          '16025',
          '16040',
          '16041',
          '16042',
          '16043',
          '16045',
          '16114',
          '16226',
          '16704',
          '16712',
          '16910',
          '16911',
          '16912',
          '16913',
          '16914',
          '16915',
          '16916',
          '7308'
        ];

Data missing in file1 which present in file2: $VAR1 = [
          '15211',
          '16024',
          '16051',
          '16201',
          '16225',
          '16350'
        ];

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM