简体   繁体   中英

Comparing three files in Perl

I have three text files containing names and grades. I removed the grades and created new files with just the names. Here is what the files look like:

first.txt

Alice
Bob
Carl
Derrick
Jessica
Sarah
Zach

second.txt

Alice
Bob
Derrick
Jared
Jessica
Sarah
Zach

third.txt

Bob
Jared
Sarah
Slate
Terry
Zach

I want to compare all three files and if there is a name in one file that is not in the other, I want to add it in. So at the end all files will contain the same names. I know you gonna add lines in perl, so a new file will have to created to do this.

Here is my approach to it. I start by comparing the first and second, adding differences from second into first. Then comparing first and second, adding differences from first into second. Then I compare the second file (either works) with third file, print differences from second into third file. Then I compare second and third, and print differences that are in third into both first and second. I put compare statements in as well to ensure the files have the same entries.

The files with grades are named original1.txt original2.txt original3.txt

In the end I will take the files containing the new names, and combine them with the files that have the grades. If there is no grade for a new name in the file, it will simply have no grade entry.

Is there a cleaner way of doing this? It looks like a huge mess.

Unless this is for a class or something where using perl is a hard requirement, the cleaner way is to not use perl at all, but standard shell utilites.

Assuming your originalN.txt files look something like:

Alice   A
Bob     B
Carl    C
Derrick D
Jessica A
Sarah   B
Zach    C

with tabs separating the columns

you can do:

sort -um <(cut -f1 original1.txt) \
         <(cut -f1 original2.txt) \
         <(cut -f1 original3.txt) > allnames.txt

to get a file with all the names from all three files (If they're not already sorted by name, use sort -u ... instead). This does require bash, zsh, or ksh93 for the <(command) redirection syntax.

Then you can merge those names with each individual file with a left outer join :

$ join -t$'\t' -a1 allnames.txt original1.txt
Alice   A
Bob     B
Carl    C
Derrick D
Jared
Jessica A
Sarah   B
Slate
Terry
Zach    C

and so on.


If using perl, there's no need for all those temporary files. Just stick the names from all the original files in a hash:

#!/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;

# Read all names from the files given on the command line.
my %names;
for my $file (@ARGV) {
    open my $infile, "<", $file;
    while (<$infile>) {
        my $n = ( split /\t/ )[0];
        $names{$n} = 1;
    }
}

# And for each file, merge with all the names
for my $file (@ARGV) {
    say "****** $file *******";
    open my $infile, "<", $file;
    my %grades = map { $_ => undef } keys %names;
    while (<$infile>) {
        chomp;
        my ( $name, $grade ) = split /\t/;
        $grades{$name} = $grade;
    }
    for my $name ( sort keys %grades ) {
        if ( defined $grades{$name} ) {
            say "$name\t$grades{$name}";
        }
        else {
            say $name;
        }
    }
}

Writing the results to files instead of standard output is left as an exercise for the reader.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM