简体   繁体   English

Perl:如何比较两个文件?

[英]Perl: how to compare two files?

I am trying to make script in perl which compares two text files. 我试图在比较两个文本文件的perl中制作脚本。 The differences bwteen the files should be printed out to file error.txt, together with the line number. 文件之间的差异应与行号一起打印到文件error.txt中。

Example: 例:

File 1: 文件1:

Figure 1.
Somatotropes are organized into.
Figure 2.
Comparing two xml files organized into.
Figure 3.
Somatotropes presentation of GH1,

File 2: 档案2:

Figure 1.
children with acquired organized into.
Figure 2.
Severe anterior hypoplasia,
Figure 3.
Somatotropes presentation of GH1,

Output required in errr.txt: errr.txt中所需的输出:

Error:lineno:2 please check mismatch<br>
Error:lineno:4 please check mismatch<br>

This is my code so far: 到目前为止,这是我的代码:

use strict;
use warnings;
use Text::Diff;

my $file1 = 'file1.txt';
my $file2 = 'file2.txt';
my $error = 'error.txt';

open(my $in1, '<', $file1) or die "Cannot open file '$file1' for reading: $!";
open(my $in2, '<', $file2) or die "Cannot open file '$file2' for reading: $!";
open(my $out, '>', $error) or die "Cannot open file '$error' for writing: $!";

my $lineno = 1;

while (my $line1 = <$in1>)
{
    my $line2 = <$in2>;

    printf $out "Error:lineno:%d please check mismatch\n", $lineno
        unless $line1 eq $line2;

    ++$lineno;
}

close $out or die "Cannot close file '$error': $!";
close $in2 or die "Cannot close file '$file2': $!";
close $in1 or die "Cannot close file '$file1': $!";
# the logic might be it matches line by line and the whatever mismatch found grab
# the position like line no. and print it in error.txt

my $diff  = diff "file1.txt", "file2.txt";

print $out $diff;
close $out or die "Cannot close file '$error': $!";

Here is a simple example: 这是一个简单的示例:

#!/sur/bin/perl
use strict;
use warnings;

open(FILE,"file1.txt");
my @file1 = <FILE>;
close FILE;
open(FILE,"file2.txt");
my @file2 = <FILE>;
close FILE;

my @errors = ();

for(my $line = 0; $line < scalar(@file1); $line++){
    if($file1[$line] ne $file2[$line]){
        push(@errors, "Error:lineno:".($line+1));
    }
}


open(ERROR,">","error.txt");
foreach(@errors){
    print ERROR $_."\n";
}
close ERROR;

First it open files and put them in arrays, then in a loop, it compare each line and if they are different push a message in the error array. 首先,它打开文件并将它们放在数组中,然后在循环中,比较每行,如果它们不同,则在错误数组中推送一条消息。 At the end it put errors in your error file. 最后,它将错误放入您的错误文件中。

The code will fail on different size files, I let you implement this fonctions, and error statement. 该代码将在不同大小的文件上失败,我让您实现此功能以及错误声明。

What type of diff are you attempting? 您尝试哪种类型的差异? Are you assuming that the two files have the same number of lines? 您是否假设两个文件的行数相同? In a true diff, you could assume that lines might not always line up . 在真正的差异中,您可以假设行不一定总是对齐 Let's look a these two files: 让我们看一下这两个文件:

File #1 文件1

Line #1
Line #2
FOOBAR!
Line #3
Line #4

File #2 文件2

Line #1
FOOBAR!
Line #2
Line #3
Line #4

We look at this and say "In file #1, there's an added line FOOBAR between Line #1 and Line #2 . In file #2, this line is between Line #1 and Line #2 . In a diff program, it would say these files are pretty much identical except for that FOOBAR line. 我们看一下,然后说:“在文件#1中,在行Line #1Line #2之间增加了一条线FOOBAR 。在文件#2中,此行在Line #1Line #2 。在diff程序中,它将说除了FOOBAR行,这些文件几乎相同。

However, if I did a line-by-line comparison, I would find all of the lines different except for that first line. 但是,如果我逐行进行比较,我会发现除第一行以外的所有行都是不同的。

In your program, you do a line-by-line difference and it's pretty good. 在您的程序中,您会逐行进行比较,这非常好。 You do a lot of more modern syntax, you use strict and warnings . 您使用了许多更现代的语法,并使用strictwarnings If I was writing it, I would do my loop a bit differently. 如果我正在编写它,我的循环会有所不同。 I'd probably use an infinite loop and break out of it when I run out of lines from either file: 我可能会使用一个无限循环,当我用尽任何一个文件中的行时都会中断它:

for (;;) {
    my $line1 = <$in1>;
    my $line2 = <$in2>;
    if    ( not $line1 and $line2 ) {
        say STDERR "ERROR: File #1 is shorter than File #2";
        last;
    }
    elsif ( $line1 and not $line2 ) {
        say STDERR "ERROR: File #2 is shorter than File #1";
        last;
    elsif ( not $line1 and not $line2 ) {
        say "Both files are the same length";
        last
    }
    chomp $line1;
    chomp $line2;
    ...   # Compare the lines, etc.
}

My reasoning would be that you don't know which file will end first, and that looping for each line in one file is misleading. 我的理由是,您不知道哪个文件将首先结束,并且一个文件中的每一行的循环都是令人误解的。 You're reading in two files until one of them run out of lines. 您正在读取两个文件,直到其中一个用完为止。 (I would also use say which I like much better than print and autodie since you're just dying anyway if the files can't be opened. (我还会say我比print和自动autodie更喜欢的autodie因为如果无法打开文件,无论如何您都会死去。

You already are using Text::Diff which will do a file comparison for you and a bit more thoroughly than a simple line-by-line. 您已经在使用Text::Diff ,它将为您进行文件比较,并且比简单的逐行操作更彻底。 This is why we use Perl modules. 这就是为什么我们使用Perl模块的原因。 Good modules are tested in a wider arena and have found all of the various exceptions and other difficulties that make programming so difficult. 好的模块已经在更广阔的领域中进行了测试,并且发现了所有各种例外情况和其他使编程变得如此困难的困难。 Anticipating exceptions is what makes programming so difficult. 预期异常是使编程如此困难的原因。

I would use Text::Diff and play around with it and its configuration. 我将使用Text::Diff并使用它及其配置。 I have never used it. 我没用过 Howeverr, it might be possible to use its output (which can be captured), and use that to get the output you desire. 但是,可能可以使用其输出(可以捕获),并使用该输出获得所需的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM