简体   繁体   中英

Perl: how to compare two files?

I am trying to make script in perl which compares two text files. The differences bwteen the files should be printed out to file error.txt, together with the line number.

Example:

File 1:

Figure 1.
Somatotropes are organized into.
Figure 2.
Comparing two xml files organized into.
Figure 3.
Somatotropes presentation of GH1,

File 2:

Figure 1.
children with acquired organized into.
Figure 2.
Severe anterior hypoplasia,
Figure 3.
Somatotropes presentation of GH1,

Output required in errr.txt:

Error:lineno:2 please check mismatch<br>
Error:lineno:4 please check mismatch<br>

This is my code so far:

use strict;
use warnings;
use Text::Diff;

my $file1 = 'file1.txt';
my $file2 = 'file2.txt';
my $error = 'error.txt';

open(my $in1, '<', $file1) or die "Cannot open file '$file1' for reading: $!";
open(my $in2, '<', $file2) or die "Cannot open file '$file2' for reading: $!";
open(my $out, '>', $error) or die "Cannot open file '$error' for writing: $!";

my $lineno = 1;

while (my $line1 = <$in1>)
{
    my $line2 = <$in2>;

    printf $out "Error:lineno:%d please check mismatch\n", $lineno
        unless $line1 eq $line2;

    ++$lineno;
}

close $out or die "Cannot close file '$error': $!";
close $in2 or die "Cannot close file '$file2': $!";
close $in1 or die "Cannot close file '$file1': $!";
# the logic might be it matches line by line and the whatever mismatch found grab
# the position like line no. and print it in error.txt

my $diff  = diff "file1.txt", "file2.txt";

print $out $diff;
close $out or die "Cannot close file '$error': $!";

Here is a simple example:

#!/sur/bin/perl
use strict;
use warnings;

open(FILE,"file1.txt");
my @file1 = <FILE>;
close FILE;
open(FILE,"file2.txt");
my @file2 = <FILE>;
close FILE;

my @errors = ();

for(my $line = 0; $line < scalar(@file1); $line++){
    if($file1[$line] ne $file2[$line]){
        push(@errors, "Error:lineno:".($line+1));
    }
}


open(ERROR,">","error.txt");
foreach(@errors){
    print ERROR $_."\n";
}
close ERROR;

First it open files and put them in arrays, then in a loop, it compare each line and if they are different push a message in the error array. At the end it put errors in your error file.

The code will fail on different size files, I let you implement this fonctions, and error statement.

What type of diff are you attempting? Are you assuming that the two files have the same number of lines? In a true diff, you could assume that lines might not always line up . Let's look a these two files:

File #1

Line #1
Line #2
FOOBAR!
Line #3
Line #4

File #2

Line #1
FOOBAR!
Line #2
Line #3
Line #4

We look at this and say "In file #1, there's an added line FOOBAR between Line #1 and Line #2 . In file #2, this line is between Line #1 and Line #2 . In a diff program, it would say these files are pretty much identical except for that FOOBAR line.

However, if I did a line-by-line comparison, I would find all of the lines different except for that first line.

In your program, you do a line-by-line difference and it's pretty good. You do a lot of more modern syntax, you use strict and warnings . If I was writing it, I would do my loop a bit differently. I'd probably use an infinite loop and break out of it when I run out of lines from either file:

for (;;) {
    my $line1 = <$in1>;
    my $line2 = <$in2>;
    if    ( not $line1 and $line2 ) {
        say STDERR "ERROR: File #1 is shorter than File #2";
        last;
    }
    elsif ( $line1 and not $line2 ) {
        say STDERR "ERROR: File #2 is shorter than File #1";
        last;
    elsif ( not $line1 and not $line2 ) {
        say "Both files are the same length";
        last
    }
    chomp $line1;
    chomp $line2;
    ...   # Compare the lines, etc.
}

My reasoning would be that you don't know which file will end first, and that looping for each line in one file is misleading. You're reading in two files until one of them run out of lines. (I would also use say which I like much better than print and autodie since you're just dying anyway if the files can't be opened.

You already are using Text::Diff which will do a file comparison for you and a bit more thoroughly than a simple line-by-line. This is why we use Perl modules. Good modules are tested in a wider arena and have found all of the various exceptions and other difficulties that make programming so difficult. Anticipating exceptions is what makes programming so difficult.

I would use Text::Diff and play around with it and its configuration. I have never used it. Howeverr, it might be possible to use its output (which can be captured), and use that to get the output you desire.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM