简体   繁体   中英

Regex to match perl variable

I'm currently learning about regular expressions and I'm trying to create a regex to match any legal variable name in Perl.

This is what I wrote so far:

^\$[A-Za-z_][a-zA-Z0-9_]*

The only problem is the regex returns true for special signs, for example the string $a& will return true.

What I did wrong?

Thanks! Rotem

Parsing Perl is difficult, and the rules for what is and is not a variable are complicated. If you're attempting to parse Perl, consider using PPI instead. It can parse a Perl program and do things like find all the variables. PPI is what perlcritic uses to do its job.

If you want to try and do it anyway, here's some edge cases to consider...

$^F
$/
${^ENCODING}
$1
$élite           # with utf8 on
${foo}
*{foo} = \42;
*{$name} = \42;  # with strict off
${$name} = 42;   # with strict off

And of course the other sigils @%* . And detecting if something is inside a single quoted string. Which is my way of strongly encouraging you to use PPI rather than try to do it yourself.

If you want practice, realistic practice is to pull the variable out of a larger string, rather than do exact matches.

# Match the various sigils.
my $sigils         = qr{ [\$\@\%*] }x;

# Match $1 and @1 and so on
my $digit_var      = qr{ $sigils \d+ }x;

# Match normal variables
my $named_var      = qr{ $sigils [\w^0-9] \w* }x;

# Combine all the various variable matches
my $match_variable = qr{ ( $named_var | $digit_var ) }x;

This uses the () capture operator to grab just the variable. It also uses the /x modifier to make the regex easier to read and alternative delimiters to avoid leaning toothpick syndrome . Using \\w instead of AZ ensures that Unicode characters will be picked up when utf8 is on, and that they won't when its off. Finally, qr is used to build up the regex in pieces. Filling in the gaps is left as an exercise.

You need a $ at the end, otherwise it's just matches as far as it can and ignores the rest. So it should be:

^\$[A-Za-z_][A-Za-z0-9]*$

I needed to solve this problem to create a simple source code analyzer.
This subroutine extracts Perl user variables from an input section of code

sub extractVars {
    my $line = shift;
    chomp $line;
    $line =~ s/#.*//;       # Remove comments
    $line =~ s/\s*;\s*$//;  # Remove trailing ;
    my @vars = ();
    my $match = 'junk';
    while ($match ne '') {
        push @vars, $match if $match ne 'junk';
        $match = ''; 
        if ($line =~ s/(
                [\@\$\%]            # $@%
                {?                  # optional brace
                \$?                 # optional $
                [\w^0-9]            # begin var name
                [\w\-\>\${}\[\]'"]* # var name
                [\w}\]]             # end var name
                |
                [\@\$\%]            # $@%
                {?                  # optional brace
                \$?                 # optional $
                [\w^0-9]            # one letter var name
                [}\]]?              # optional brace or bracket
                )//x) {
            $match = $1;
            next;
        }
    }
    return @vars;
}

Test it with this code:

my @variables = extractVars('$a $a{b} $a[c] $scalar @list %hash $list[0][1] $list[-1] $hash{foo}{bar} $aref->{foo} $href->{foo}->{bar} @$aref %$hash_ref %{$aref->{foo}} $hash{\\'foo\\'} "$a" "$var{abc}"');

It does NOT work if the variable name contains spaces, for example:

  • $hash{"baz qux"}
  • ${ $var->{foo} }[0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM