简体   繁体   English

在 Perl 中比较两个文件中的数据

[英]Comparing data in two files in Perl

I have two files,我有两个文件,

File A:文件A:

Folder name A
   cp A
   cp B
Folder name B
   cp D
   cp F

File B:文件乙:

Folder name C
    cp A
    cp B
Folder name A
    cp A
    cp B
    cp C
Folder name B
    cp D
    cp F
Folder name D
    cp A
    cp D

The output should be:输出应该是:

Folder name C:
     cp A
     cp B
Folder name D
     cp A
     cp D
Folder name A
     cp C

Basically, I want to check if there is a match in the folder name and then check for a match in the cp name for the same folder name.基本上,我想检查文件夹名称是否匹配,然后检查相同文件夹名称的 cp 名称是否匹配。 Then we need to delete the matches.然后我们需要删除匹配项。 Can anyone help me as I am new to perl.任何人都可以帮助我,因为我是 perl 的新手。

I have code where it gives the folder names properly but deletes some of the cp names.我有代码,它正确地给出了文件夹名称,但删除了一些 cp 名称。

my %file2;
open my $file2, '<', 'fileA.txt' or die "Couldnt open fileA.txt";
while (my $line = <$file2>)
{
   ++$file2{$line};
 }
open my $file1, '<', 'fileB.txt' or die "Couldnt open fileB.txt";
while (my $line = <$file1>)
{
   print $fh $line unless $file2{$line};
 }

There are two problems: parsing your data format, and doing the comparison.有两个问题:解析您的数据格式,并进行比较。 You can't just compare the files line by line, your file has a structure and you need to parse it into a Perl data structure.您不能只是逐行比较文件,您的文件有一个结构,您需要将其解析为 Perl 数据结构。

sub parse_file {
    my $file = shift;

    open my $fh, '<', $file;

    my $in_folder;
    my %folders = ();

    while(<$fh>) {
        # Entering a folder
        if( /^Folder name (.*)\s*$/ ) {
            $in_folder = $1;
        }
        # We're in a folder
        elsif( $in_folder ) {
            # Add a line to the folder actions
            if( /^\s+(.*)\s*$/ ) {
                push @{$folders{$in_folder}}, $1;
            }
            # We exited the folder but didn't enter another one
            elsif( /^\S/ ) {
                $in_folder = '';
            }
        }
    }

    return \%folders;
}

This is a lot of extra code to write and debug.这是需要编写和调试的大量额外代码。 If your files were stored in something like YAML, JSON or XML you could use a library to do it.如果您的文件存储在诸如 YAML、JSON 或 XML 之类的内容中,您可以使用库来执行此操作。

I've deliberately opted to strip out the formatting and just store the folder names.我故意选择去掉格式并只存储文件夹名称。 This makes the data easier to work with and shields the rest of the code from formatting changes.这使数据更易于使用,并保护其余代码免受格式更改的影响。

Now each file is a hash of folder names which contain a list of commands.现在每个文件都是文件夹名称的散列,其中包含命令列表。

      {
        'A' => [
                 'cp A',
                 'cp B'
               ],
        'B' => [
                 'cp D',
                 'cp F'
               ]
      }

Now we need to compare them.现在我们需要比较它们。 The algorithm is like this:算法是这样的:

  1. If a folder is in only one file, pick it.如果一个文件夹只在一个文件中,请选择它。
  2. If a folder is on both files, show the differences (if any).如果文件夹在两个文件上,请显示差异(如果有)。

Fortunately, we have Array::Utils to do all the necessary intersection and diffs for us.幸运的是,我们有Array::Utils来为我们做所有必要的交集和差异。 Use array_diff to find which folders are only in one file, and intersection to find those which are in both.使用array_diff查找仅在一个文件中的文件夹,并使用intersection查找同时存在于两个文件中的文件夹。 Then use array_diff again to find the differences.然后再次使用array_diff查找差异。

sub compare_folders {
    my($set1, $set2) = @_;

    my @set1_names = keys %$set1;
    my @set2_names = keys %$set2;

    my %diffs;

    # It's in one but not the other.
    for my $name (array_diff @set1_names, @set2_names) {
        $diffs{$name} = $set1->{$name} || $set2->{$name};
    }

    # It's in both.
    for my $name (intersect @set1_names, @set2_names) {
        # They're different
        if( my @diff = array_diff(@{$set1->{$name}}, @{$set2->{$name}}) ) {
            $diffs{$name} = \@diff;
        }
    }

    return \%diffs;
}

Finally we need to display the results.最后,我们需要显示结果。 As I've opted to make the data generic and strip out the formatting, we need to put it back.由于我选择使数据通用并去除格式,因此我们需要将其放回原处。

sub display_folder {
    my($name, $values) = @_;

    my $display = "Folder name $name\n";

    for my $value (@$values) {
        $display .= "    $value\n"
    }

    return $display;
}

And stick it all together.并将它们粘在一起。

my @folders = map { parse_file($_) } @ARGV;

my $diff = compare_folders(@folders);

for my $name (keys %$diff) {
    my $values = $diff->{$name};
    print display_folder($name, $values);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM