简体   繁体   中英

How to match regex single combine with multi groups same pattern in Perl

I have data in the following format:

n123a456ba789ba101112ba131415b
n124a12345ba78910ba101113b
n125a1234ba7891ba101114ba131415ba16171819b

n following with some data until a.

a is start field.

b is end field.

It has multiple fields a...b

I want to capture n data and data between a...b into array. I tried with the following code but didn't work.

$var = "n123a456ba789ba101112ba131415b";

($n, @groups) = $var =~ /n(.+?)(?:a(.+?)b)+/;

print join(',', $n, @groups);

You can use the following regex:

See regex in use here

/(?|n([^a]+)(?=a)|a([^b]+)b)/g

How it works:

  • (?|...) branch reset - any subpatterns in such a group share the same number (this makes the capture groups share the same index if alternations exist)
  • n([^a]+)(?=a)|a([^b]+)b match either of the following options:
    • n([^a]+)(?=a)
      • n matches n literally
      • ([^a]+) captures any character except a one or more times into a capture group
      • (?=a) ensures the character a follows (positive lookahead) without consuming the character
    • a([^b]+)b
      • a matches a literally
      • ([^b]+) captures any character except b one or more times into a capture group
      • b matches b literally

See code in use here

$var = "n123a456ba789ba101112ba131415b";
($n, @groups) = $var =~ /(?|n([^a]+)(?=a)|a([^b]+)b)/g;
print join(',', $n, @groups);

The target sample line seems fairly consistent.
This is a quick way to get it all in a single array, or the capture can be portioned via my ($n, @vals) if needed.

$_ = "n123a456ba789ba101112ba131415b";
my @vals = /[na]([^ab]*)/g;
print join(',', @vals)

output

123,456,789,101112,131415

Please verify if the following code is comply with your requirements

use strict;
use warnings;

use feature 'say';

use Data::Dumper;

my %hash;

while( <DATA> ) {
    if( /^n(\d+)/ ) {
        my $n = $1;
        my @data = /a(\d+?)b/g;
        $hash{$n} = \@data;
    } 
}

say Dumper(\%hash);

__DATA__
n123a456ba789ba101112ba131415b
n124a12345ba78910ba101113b
n125a1234ba7891ba101114ba131415ba16171819b

output

$VAR1 = {
          '123' => [
                     '456',
                     '789',
                     '101112',
                     '131415'
                   ],
          '125' => [
                     '1234',
                     '7891',
                     '101114',
                     '131415',
                     '16171819'
                   ],
          '124' => [
                     '12345',
                     '78910',
                     '101113'
                   ]
        };

Another way is to use split :

use strict;
use warnings;
use Data::Dump qw(dump);

while(<DATA>) {
    chomp;
    my @l = split /[nab]+/, $_;
    shift @l;
    dump@l;
}

__DATA__
n123a456ba789ba101112ba131415b
n124a12345ba78910ba101113b
n125a1234ba7891ba101114ba131415ba16171819b

Output:

(123, 456, 789, 101112, 131415)
(124, 12345, 78910, 101113)
(125, 1234, 7891, 101114, 131415, 16171819)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM