简体   繁体   中英

Perl: Comparing a hash of arrays with another array

Given the files:

A. hash.pl:

%h1 = (
  A=>['4631 4576','6646 6646',],
  B=>['3539 4576',],
);

B. input.txt

4576    4631    4
4576    3539    4

I have to write a Perl code that finds the values (4631 4576) in input.txt. (The order is not important.) Here, '4631 4576' appears as 4576 4631 in input.txt.

I wrote the following code but there seems to be some problem:

#!/usr/bin/perl -w
open (FH, "input.txt") or die "can't open file: &! \n";
require "hash.pl";
foreach $amp (<FH>)
{
    if ($amp=~/(\d+)\t(\d+)\t(\d+)/)        
    {       
        foreach $keys (keys %h1)
        {
            @tmparray= @{$h1{$keys}};
            foreach $tmp1 (@tmparray)
            {
                if ($tmp1 =~ m/($1 $2|$2 $1)/ )
                {
                    print "$keys", "$3\n";
                }
            }
        }
    }
}
close (FH);
exit;

What is wrong with this code?

This solution uses do in preference to require as the latter is intended for inluding library source files and returns a useless scalar value in this context. do simply returns the value of the last statement executed and so can be used to initialise a local variable.

Rather than using a regex, this program just calls split to collect the non-whitespace fields in the file. It then checks that there were three and that they were all numeric.

Putting the result of the split into an array avoids the problem that the captured regex fields were being lost.

The regular expression $re is built, allowing the first two fields to appear in either order, and then grep is called on each hash element to verify whether any of the values in the hash value arrays match this file entry.

The output seems minimal, but it contains the same information as the original code displayed.

use strict;
use warnings;

my %data = do 'hash.pl';

open my $fh, '<', 'input.txt' or die $!;

while (<$fh>) {

  my @values = split;
  next if grep /\D/, @values or @values != 3;

  my $re = qr/\A$values[0]\s+$values[1]\z|\A$values[1]\s+$values[0]\z/;

  foreach my $key (keys %data) {
    print "$key - $values[2]\n" if grep $_ =~ $re, @{$data{$key}};
  }
}

output

A - 4
B - 4

The problem is quite simple: You're using $1 , $2 , and $3 in your program, but by the time you use them, you've lost their value. These are global symbols, and they're replaced whenever you use a regular expression operator. After your first regex match, simply save them in another variable:

$first  = $1;
$second = $2;
$third  = $3;

You should also be careful with regular expressions. Your regular expressions work, but they are very, very narrow. I missed it the first time that you had tabs in your files. I like using \\s+ for whitespace of any sort. This will cover multiple tabs or spaces or a combination of different ones.

I also highly recommend you learn some more modern Perl . You would have immediately picked up the problem if you had used these two lines in your program:

use strict;
use warnings;

The strict would make sure that you have defined your variables via my or our . That makes sure that you don't say $Foo one place and $foo another and wonder what happened to the value you stored in $foo .

The warnings would have immediately highlighted that $1 and $2 don't have values when you do your second regular expression match.

Because of the require , things are a bit sticky in variable declaration when you use strict . A my variable is a strictly local variable with limited scope. That's why it's used 99% of the time.

A my variable only exists in the scope it's declared. For example, if you declare a variable inside a loop, it doesn't exist outside the loop:

if ($a > $b) {
    my $highest = $a;
} 
else {
    my $highest = $b;
}
print "The highest value is $highest\n";

This won't work because $highest is defined inside the if statement. You'll have to declare $highest outside the statement for it to work:

my $highest;
if ($a > $b) {
    $highest = $a;
} 
else {
    $highest = $b;
}
print "The highest value is $highest\n";

An our declared variable is globally available to the whole package . You define it anywhere - inside a loop, inside an if statement, anywhere - and it will be available later on.

A package is just a namespace. Unless you've declared otherwise, you're always in the main package. It's useful to prevent module variables from affecting variables in your code. This way, your included module can use the variable $foo , and you can use the variable $foo without interfering with each other.

The reason I had to go into this is because of your require . A my variable is only available in its scope. That is, the for loop, the if statement, or the entire file. Notice that last one: The entire file . This means that if I do my %h1 , it won't exist outside of the file. Thus, I have to declare it with an our .

Also, when you use strict , it is pretty darn strict. It generates a compile time error when it sees a variable that hasn't been declared. Thus, I have to declare %h1 inside the main program, so the compiler knows about it.

I also use the say statement which I get from my use feature qw(say); . It's like print except it always prints a NL character. It doesn't seem like much, but it can be a lot less messy in many circumstances.

It is now highly recommended that you use a declared scalar for opening a file instead of just a file handle. File handles are global and can cause problems. Plus, it's hard to use a file handle in a subroutine. Also, it's recommended to use the three part open statement. This prevents problems when file names start with > or | .

Here's the program rewritten with a bit more modern Perl flair. I kept your standard algorithm, but added the new pragmas, declared %h1 before require ing it, and used the more standard open . Otherwise, it's pretty much what you had.

#! /usr/bin/env perl
#

use strict;
use warnings;
use feature qw(say);

our %h1;
require "hash.pl";


open ( my $input_fh, "<", "input.txt" )
    or die "can't open file: $! \n";

foreach my $amp ( <$input_fh> ) {
    chomp $amp;
    if ( $amp =~ /(\d+)\s+(\d+)\s+(\d+)/ ) {
        # Got to save the $1, $2, and $3 for later
        my $first = $1;
        my $second = $2;
        my $third = $3;
        foreach my $key ( keys %h1 ) {
            foreach my $tmp1 ( @{$h1{$key}} ) {
                if ($tmp1 =~ /($first\s+$second|$second\s+$first)/ ) {
                    say qq("$key": "$third");
                }
            }
        }
    }
}
close $input_fh;

You are trying to reuse the variables $1 , $2 and $3 inside another regex, and I suspect that is what is messing things up. When I try your code, I get the error:

Use of uninitialized value $2 in regexp compilation ...

So a possible solution is to copy the values right after they are captured, to avoid the $1 etc variables to become overwritten when the second regex is compiled:

if ($amp=~/(\d+)\t(\d+)\t(\d+)/) {       
    my @args = ($1,$2,$3);

And then of course replace $1 with $args[0] and so forth.

You should also be aware that running a script without use warnings is not a good idea. The time you think you save by being lazy will be lost 10 times over debugging simple errors. Why use strict and warnings?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM