简体   繁体   中英

Load regex from file and match groups with it in Perl

I have a file containing regular expressions, eg:

City of (.*)
(.*) State

Now I want to read these (line by line), match them against a string, and print out the extraction (matched group). For example: The string City of Berlin should match with the first expression City of (.*) from the file, after that Berlin should be extracted.

This is what I've got so far:

use warnings;
use strict;

my @pattern;

open(FILE, "<pattern.txt");   # open the file described above
while (my $line = <FILE>) {
    push @pattern, $line;     # store it inside the @pattern variable
}
close(FILE);

my $exampleString = "City of Berlin"; # line that should be matched and
                                      # Berlin should be extracted

foreach my $p (@pattern) { # try each pattern
    if (my ($match) = $exampleString =~ /$p/) {
        print "$match";
    }
}

I want Berlin to be printed.

  • What happens with the regex inside the foreach loop?
  • Is it not compiled? Why?
  • Is there even a better way to do this?

Your patterns contain a newline character which you need to chomp :

while (my $line = <FILE>) {
    chomp $line;
    push @pattern, $line;
}

First off - chomp is the root of your problem.

However secondly - your code is also very inefficient. Rather than checking patterns in a foreach loop, consider instead compiling a regex in advance:

#!/usr/bin/env perl
use strict;
use warnings;

# open ( my $pattern_fh, '<', "pattern.txt" ) or die $!;
my @patterns = <DATA>;
chomp(@patterns);
my $regex = join( '|', @patterns );
$regex = qr/(?:$regex)/;
print "Using regex of: $regex\n";

my $example_str = 'City of Berlin';

if ( my ($match) = $example_str =~ m/$regex/ ) {
    print "Matched: $match\n";
}

Why is this better? Well, because it scales more efficiently. With your original algorithm - if I have 100 lines in the patterns file, and 100 lines to check as example str, it means making 10,000 comparisons.

With a single regex, you're making one comparison on each line.

Note - normally you'd use quotemeta when reading in regular expressions, which will escape 'meta' characters. We don't want to do this in this case.

If you're looking for even more concise, you can use map to avoid needing an intermediate array:

my $regex = join( '|', map { chomp; $_ }  <$pattern_fh>  );
$regex = qr/(?:$regex)/;
print "Using regex of: $regex\n";

my $example_str = 'City of Berlin';

if ( my ($match) = $example_str =~ m/$regex/ ) {
    print "Matched: $match\n";
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM