简体   繁体   中英

Comparing data in two files in Perl

I have two files,

File A:

Folder name A
   cp A
   cp B
Folder name B
   cp D
   cp F

File B:

Folder name C
    cp A
    cp B
Folder name A
    cp A
    cp B
    cp C
Folder name B
    cp D
    cp F
Folder name D
    cp A
    cp D

The output should be:

Folder name C:
     cp A
     cp B
Folder name D
     cp A
     cp D
Folder name A
     cp C

Basically, I want to check if there is a match in the folder name and then check for a match in the cp name for the same folder name. Then we need to delete the matches. Can anyone help me as I am new to perl.

I have code where it gives the folder names properly but deletes some of the cp names.

my %file2;
open my $file2, '<', 'fileA.txt' or die "Couldnt open fileA.txt";
while (my $line = <$file2>)
{
   ++$file2{$line};
 }
open my $file1, '<', 'fileB.txt' or die "Couldnt open fileB.txt";
while (my $line = <$file1>)
{
   print $fh $line unless $file2{$line};
 }

There are two problems: parsing your data format, and doing the comparison. You can't just compare the files line by line, your file has a structure and you need to parse it into a Perl data structure.

sub parse_file {
    my $file = shift;

    open my $fh, '<', $file;

    my $in_folder;
    my %folders = ();

    while(<$fh>) {
        # Entering a folder
        if( /^Folder name (.*)\s*$/ ) {
            $in_folder = $1;
        }
        # We're in a folder
        elsif( $in_folder ) {
            # Add a line to the folder actions
            if( /^\s+(.*)\s*$/ ) {
                push @{$folders{$in_folder}}, $1;
            }
            # We exited the folder but didn't enter another one
            elsif( /^\S/ ) {
                $in_folder = '';
            }
        }
    }

    return \%folders;
}

This is a lot of extra code to write and debug. If your files were stored in something like YAML, JSON or XML you could use a library to do it.

I've deliberately opted to strip out the formatting and just store the folder names. This makes the data easier to work with and shields the rest of the code from formatting changes.

Now each file is a hash of folder names which contain a list of commands.

      {
        'A' => [
                 'cp A',
                 'cp B'
               ],
        'B' => [
                 'cp D',
                 'cp F'
               ]
      }

Now we need to compare them. The algorithm is like this:

  1. If a folder is in only one file, pick it.
  2. If a folder is on both files, show the differences (if any).

Fortunately, we have Array::Utils to do all the necessary intersection and diffs for us. Use array_diff to find which folders are only in one file, and intersection to find those which are in both. Then use array_diff again to find the differences.

sub compare_folders {
    my($set1, $set2) = @_;

    my @set1_names = keys %$set1;
    my @set2_names = keys %$set2;

    my %diffs;

    # It's in one but not the other.
    for my $name (array_diff @set1_names, @set2_names) {
        $diffs{$name} = $set1->{$name} || $set2->{$name};
    }

    # It's in both.
    for my $name (intersect @set1_names, @set2_names) {
        # They're different
        if( my @diff = array_diff(@{$set1->{$name}}, @{$set2->{$name}}) ) {
            $diffs{$name} = \@diff;
        }
    }

    return \%diffs;
}

Finally we need to display the results. As I've opted to make the data generic and strip out the formatting, we need to put it back.

sub display_folder {
    my($name, $values) = @_;

    my $display = "Folder name $name\n";

    for my $value (@$values) {
        $display .= "    $value\n"
    }

    return $display;
}

And stick it all together.

my @folders = map { parse_file($_) } @ARGV;

my $diff = compare_folders(@folders);

for my $name (keys %$diff) {
    my $values = $diff->{$name};
    print display_folder($name, $values);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM