简体   繁体   中英

Capturing optional string using perl regex

I am trying to parse strings which have this pattern

src [interface_name:source_address[/source_port]] 

where the parts in the brackets are optional. So there are 3 possible variants

src
src LAN:10.115.1.204
src LAN:10.115.1.204/8080

I want to capture the interface , source ip and source port from this string.

My regex for third variant is

($srcinterface,$srcip,$src_port) = m/^src (.*?):(.*?)\/(.*?)/;

But I don't know how to make a regex that works for all 3 variants.

EDIT The bigger part of the problem is that like src dst information is also being received from the system and I need to repeat the regex. See below Strings:-

src dst outside:125.22.32.192
src outside:182.201.183.178 dst outside:125.22.32.192
src outside:182.201.183.178/5525 dst outside:125.22.32.192/8595

use this instead:

/^src(?> (\w++):((?>[0-9]{1,3}\.){3}[0-9]{1,3})(?>\/([0-9]++))?)?/

an example script:

#!/usr/bin/perl

use strict;

my $str = "src
src LAN:10.115.1.204
src LAN:10.115.1.204/8080";
my $i = 0;
while($str =~ /^src(?> (\w++):((?>[0-9]{1,3}\.){3}[0-9]{1,3})(?>\/([0-9]++))?)?/gm) {
print "\n[match " . ++$i . "]"
    . "\nWhole match    : $&"
    . "\nCapture group 1: $1"
    . "\nCapture group 2: $2"
    . "\nCapture group 3: $3\n";
}

For a more permissive pattern, you can use this:

/^src(?> (\w++):([^\/\n]++)(?>\/([^\n]++))?)?/gm

or this:

/^src(?> (\w++):([^\/\n]++)(?>\/(\S++))?)?/gm

The idea for these pattern is to use negated character classes, for example [^\\/\\n] means all characters that are not a slash or a newline . You can easily adapt these classes to your needs adding or removing characters.

I'm no Perl guru, but maybe this works:

($srcinterface,$srcip,$src_port) = m/^src\s*(?:(.*?):(.*?)(?:\/(.*?))?)?/;

?: should make it a hidden group, ? at the end of a group makes it optional.

Well, the readability goes haywire...

It's not clear which of the fields are optional, but you can simply split on a regular expression to separate what is there.

In this program, the @fields array will contain as many fields as are specified. Assuming optional fields disappear from the right (ie there can be no source address wihtout an interface name, and no source port without both a name and an address) you can simply count the fields in @fields to see which were provided.

use strict;
use warnings;

use Data::Dump;

for (
    'src',
    'src LAN:10.115.1.204',
    'src LAN:10.115.1.204/8080') {

    my @fields = split /[\/\s]+/;

    dd \@fields;
}

output

["src"]
["src", "LAN:10.115.1.204"]
["src", "LAN:10.115.1.204", 8080]

This regex worked for me

($srcinterface, $srcip, $src_port) = m@^src (?:([^:]+):([^/]+))?(?(1)(?:/(.+))?)@;

Notes:

  • I'm using negated character class (eg [^:] ) and + because the .*? would cause trouble for variants 2 and 3 due to the fact that the regex following .*? is not well defined (simply put, .*? would match a zero-length string).

  • I made the interface_name:source_address part optional with an enclosing (?:...)?

  • Then I used the conditional regex (?(1)pattern) which means “match pattern if capture group 1 is matched successfully”

    Effectively, if interface_name:source_address is matched, look for /port

  • Since /port is optional, I wrapped the part in another (?:...)? inside the conditional regex.

For what it's worth, I think Borodin's split-based answer is way simpler and Casimir et Hippolyte's regex-based answer is better in terms of robustness since it actually validates each component. I'm just posting this for the sake of completion.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM