简体   繁体   中英

How to get a group matched in a recursive regular expression?

I'm writing a simple regular expression that needs receive a pair of coordinates and/or a map name.

For example:

move 10 15 # should returns [[10, 15]]
move 10 15 map # should returns [[10, 15, 'map']]
move map # should returns [['map']]
move 10 15 mapA mapB # should returns [[10, 15, 'mapA'], ['mapB']] 
move 10 15 mapA mapB 33 44 # should returns [[10, 15, 'mapA'], ['mapB'], [33, 44]]
move 10 15 mapA 33 44 mapB # should returns [[10, 15, 'mapA'], [33, 44, 'mapB']]

Then, I wrote this regular expression:

/
  (?(DEFINE)
     (?<coord>    (?<x>\d+)\s+(?<y>\d+) )
     (?<map>      (?<mapname>[a-zA-Z]+) )
     (?<commands> \s* (?: (?&coord) | (?&map) ) \s* (?&commands)? )
  )
  move\s+(?&commands)
/six

But how I can get the value for groups x , y and map using Perl?

I tried with some ways:

use strict;
use warnings;

my $command = 'move 10 15';

$command =~ /
  (?(DEFINE)
     (?<coord>    (?<x>\d+)\s+(?<y>\d+) )
     (?<map>      (?<mapname>[a-zA-Z]+) )
     (?<commands> \s* (?: (?&coord) | (?&map) ) \s* (?&commands)? )
  )
  move\s+(?&commands)
/six;

while (my ($k,$v) = each %+) { print "$k $v\n" }
print "$+{x}";

As the question stands, you can't have that. The perlre says about this

Note that capture groups matched inside of recursion are not accessible after the recursion returns, so the extra layer of capturing groups is necessary.

but the pattern <x> can't be had with "an extra layer" of capturing afterwards since it's used only inside the grammar. You can only have the whole thing

if ($command =~ /
        move\s+ (?<match>(?&commands))
        (?(DEFINE)
            (?<coord>    (?<x>\d+)\s+(?<y>\d+) )
            (?<map>      (?<mapname>[a-zA-Z]+) )
            (?<commands> \s* (?: (?&coord) | (?&map) ) \s* (?&commands)? )
        )
    /six)
{
    say "got: $+{match}";
}

where I've moved the ?(DEFINED) block at the end of the pattern, as recommended.

Note that it wouldn't make sense either: in a recursive match, which of multiple <x> should one get? So you'd need to restructure the approach so to be able to re-capture the match you want; but I don't see how to do this if you want a subpattern buried so deeply.

For the problem as presented I'd go with writing a simple parser, never mind a catch-all regex. Or, in your approach re-process the match for its parts, hopefully much easier once you have it.

And then there are powerful tools, like Marpa::R2 , Parse::RecDescent , Regexp::Grammars .

Maybe it is better to divide and conquer than to force feed everything into one regex?

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;

while (<DATA>) {
    my @row;
    chomp;
    if (/^move/) {
        while (/(?:(\d+)\s+(\d+))?(?:\s+([[:alpha:]]+))?/g) {
            my @match;
            push(@match, +$1, +$2) if $1 && $2;
            push(@match, $3)       if $3;
            push(@row, \@match) if @match;
        }
    }

    print "$_: ", Dumper(\@row);
}

exit 0;

__DATA__
move 10 15
move 10 15 map
move map
move 10 15 mapA mapB
move 10 15 mapA mapB 33 44
move 10 15 mapA 33 44 mapB

Test run:

$ perl dummy.pl
move 10 15: $VAR1 = [
          [
            '10',
            '15'
          ]
        ];
move 10 15 map: $VAR1 = [
          [
            '10',
            '15',
            'map'
          ]
        ];
move map: $VAR1 = [
          [
            'map'
          ]
        ];
move 10 15 mapA mapB: $VAR1 = [
          [
            '10',
            '15',
            'mapA'
          ],
          [
            'mapB'
          ]
        ];
move 10 15 mapA mapB 33 44: $VAR1 = [
          [
            '10',
            '15',
            'mapA'
          ],
          [
            'mapB'
          ],
          [
            '33',
            '44'
          ]
        ];
move 10 15 mapA 33 44 mapB: $VAR1 = [
          [
            '10',
            '15',
            'mapA'
          ],
          [
            '33',
            '44',
            'mapB'
          ]
        ];

Since I cannot yet comment, Stefan Becker's solution has a flaw.

It will fail if a coordinate is 0.

Here's the fix:

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;

while (<DATA>) {
    my @row;
    chomp;
    if (/^move/) {
        while (/(?:(\d+)\s+(\d+))?(?:\s+([[:alpha:]]+))?/g) {
            my @match;
            push(@match, +$1, +$2) if defined $1 && defined $2;
            push(@match, $3)       if $3;
            push(@row, \@match) if @match;
        }
    }

    print "$_: ", Dumper(\@row);
}

exit 0;

__DATA__
move 10 15
move 10 15 map
move map
move 10 15 mapA mapB
move 10 15 mapA mapB 33 44
move 10 15 mapA 33 44 mapB
move 0 15 mapA 33 44 mapB

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM