简体   繁体   中英

Perl - Problem with “]” in a regular expression

I have a string :

my $string = "name_of_my_function(arg1,arg2,[arg3,arg4])";

and I want to extract the name of the function "name_of_my_function" and the parameters :

$arg1 = "arg1"
$arg2 = "arg2"
@arg_list = ("arg3", "arg4")

the code I use to extract the function is :

$row =~ m/^([^\(]*)\(([^\)]*)\)/;
$function = $1;

However, it works when the string doesn't have any "]" , for example :

my $string = "name_of_my_function(arg1,arg2,arg3)";

but it doesn't return anything when there is a "]"

Any idea?

Thanks,

SLP

The regex you show captures the function name, and all other arguments in a string, which is a very reasonble first step. Then parse the arguments out of that second string. I expand your $string so to have multiple bracketed lists of arguments, interleaved with non-bracketed ones

perl -wE'
    $s = "name_of_my_function(arg1,arg2,[arg3,arg4],arg5,[arg6,arg7])"; 
    @m = $s =~ /^([^\(]*)\(([^\)]*)\)/; 
    @p = grep { $_ } split /\s*,\s*|\[(.*?)\]/, $m[1];
    for (@p) { 
        if (/,/) { push @arg_list, $_ }
        else     { push @args, $_ }
    }
    say $m[0];
    say for @args; 
    say for @arg_list
'

This prints

name_of_my_function
arg1
arg2
arg5
arg3,arg4
arg6,arg7

The split is where individual arguments are extracted, as well as bracketed argument list(s), each as a string. That may return empty elements thus grep { $_ } to filter them out.

Then you can proceed to extract individual arguments from lists that were in brackets, by splitting each string in @arg_list by , again.


The main part of the above can , as the problem stands, go in one statement

@p = grep { $_ } split /\( | \) | \[(.*?)\] |,/x, $s;

where I added /x modifier so to be able to space it out for readability. This delivers to @p the function name, individual arguments, and a string with (comma separated) argument list from each [] .

However, I think that it is far more sensible to break this up into several steps.

Well, if the number of arguments is variable, that is not that simple to do it with rgex only (arguments will be matched with + quantifier, so they won't be stored in capturing group, which would be easy to extract). Having in mind the above, you could use this pattern (\\w+)\\(((\\w+|\\[(\\w+,?)+\\]),?)+\\)

Explanation:

(\\w+) - match one or more word characters (name of a function) and store it in first capturing group,

(\\w+|\\[(\\w+,?)+\\]) - alternation: match \\w+ (same as above) or \\[(\\w+,?)+\\] : \\[ - match [ literally, (\\w+,?)+ - match on or more times \\w+, pattern which is one or more word characters followed by one or zero commas ( ,? ), \\] - match ] literally,

((\\w+|\\[(\\w+,?)+\\]),?)+ - match whole above pattern, optionally followed by comma ( ,? ) one or more times. This would match argument list.

\\( , \\) 0 match ( , ) literally

Further processing - extract whats between brackets () in order to extract arguments list programatically - it would be easier that doing it with complex regular expression

Demo

UPDATE :

Try pattern: https://regex101.com/r/wBcJZ0/3

I omitted explanation, as it is very similair to previous pattern.

Updted demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM