简体   繁体   中英

Perl - Combine multiple regexps without renumbering?

I need to combine multiple regexps into one, so code which looks like this:

my $s = "jump 0xbdf3487";
#my $s = "move 0xbdf3487";                                                                                         

if ($s =~ m/^(move) ([^ ]+)/) {  print "matched '$1' '$2'\n";  }
if ($s =~ m/^(jump) ([^ ]+)/) {  print "matched '$1' '$2'\n";  }
if ($s =~ m/^(call) ([^ ]+)/) {  print "matched '$1' '$2'\n";  }

becomes:

my $s = "jump 0xbdf3487";
#my $s = "move 0xbdf3487";                                                                                         

my @patterns = (
    '^(move) ([^ ]+)',
    '^(jump) ([^ ]+)',
    '^(call) ([^ ]+)'
  );

my $re = "(?:" . join("|", @patterns) . ")";
$re = qr/$re/;

if ($s =~ m/$re/) {  print "matched '$1' '$2'\n";  }

This doesn't work however, if $s is a jump we get:

matched '' ''

Matches in the combined regexp get renumbered:
($1, $2) become ($3, $4) in the jump regexp, ($5, $6) in the call one etc..

How do I combine these without renumbering ?

Use the branch reset pattern (?|pattern) (you'll need Perl 5.10 or newer though). Quoting the documentation ( perldoc perlre ):

This is the "branch reset" pattern, which has the special property that the capture groups are numbered from the same starting point in each alternation branch.

Your code becomes:

use strict; 
use warnings;

my $s = "jump 0xbdf3487";
#my $s = "move 0xbdf3487";                                                                                         

my @patterns = (
    '(move) ([^ ]+)',
    '(jump) ([^ ]+)',
    '(call) ([^ ]+)'
  );

my $re = "^(?|" . join("|", @patterns) . ")";
$re = qr/$re/;
if ($s =~ m/$re/) {  print "matched '$1' '$2'\n";  }

Note that I've added use strict and use warnings , don't forget them!

You can use simple alternation in your regex and use just a single regex:

m/^(move|jump|call) ([^ ]+)/

Code:

my $s = "jump 0xbdf3487";

if ($s =~ m/^(move|jump|call) ([^ ]+)/) {
   print "matched '$1' '$2'\n";
}

Perl Regex subpatterns can be joined together with pipes to make them alternating patterns. To separate alternating patterns from the rest of the expression pattern, delimit them as a group. If you don't want to capture what was matched by the group, make it a non-capturing group.

For example, alternation in a capturing group within a pattern:

(move|jump|call) ([^ ]+)

And alternation in a non-capturing group within a pattern:

(?:move|jump|call) ([^ ]+)

If your alternative patterns are complicated and you don't want them all on one line, you can use the /x modifier to separate them with whitespace.

Perldoc PerlRe Modifiers (scroll down to "Details on some modifiers")

/x

/x tells the regular expression parser to ignore most whitespace that is neither backslashed nor within a bracketed character class. You can use this to break up your regular expression into (slightly) more readable parts. Also, the "#" character is treated as a metacharacter introducing a comment that runs up to the pattern's closing delimiter, or to the end of the current line if the pattern extends onto the next line. Hence, this is very much like an ordinary Perl code comment. (You can include the closing delimiter within the comment only if you precede it with a backslash, so be careful!)

Use of /x means that if you want real whitespace or "#" characters in the pattern (outside a bracketed character class, which is unaffected by /x), then you'll either have to escape them (using backslashes or \\Q...\\E ) or encode them using octal, hex, or \\N{} escapes. It is ineffective to try to continue a comment onto the next line by escaping the \\n with a backslash or \\Q.

And here's my example demonstrating that:

#!/usr/bin/perl

use strict;
use warnings;

my $s = "jump 0xbdf3487";

if ($s =~ /^(

          move   # first complicated pattern

          |

          jump   # second complicated pattern

          |

          call   # third complicated pattern

    )\s([^\ ]+) /x) {   # Note I hade to escape the space
                        # with a backslash because of /x

    print "matched '$1' '$2'\n";
}

Which outputs:

matched 'jump' '0xbdf3487'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM