简体   繁体   中英

Perl regular expression variables and matched pattern substitution

Can anyone explain regular expression text substitutions when the regular expression is held in a variable? I'm trying to process some text, Clearcase config specs actually, and substitute text as I go. The rules for the substitution are held in an array of hashes that have the regular expression to match and the text to substitute.

The input text looks somthing like this:

element  /my_elem/releases/...  VERSION_STRING.020 -nocheckout

Most of the substitutions are simply to remove lines that contain a specific text string, this works fine. In some cases I want to substitute the text, but re-use the VERSION_STRING text. I've tried using $1 in the substitution expression but it doesn't work. $1 gets the version string in the match, but the replacement of $1 doesn't work in the substitution.

In these cases the output should look something like this:

element  -directory  /my_elem/releases/... VERSION_STRING.020 -nocheckout
element  /my_elem/releases/.../*.[ch]  VERSION_STRING.020 -nocheckout

ie. One line input became two output and the version string has been re-used.

The code looks something like this. First the regular expressions and substitutions:

my @Special_Regex = (   
                  { regex => "\\s*element\\s*\/my_elem_removed\\s*\/main\/\\d+\$",                  subs => "# Line removed" },
                  { regex => "\\s*element\\s*\/my_elem_changed\/releases\/\.\.\.\\s*\(\.\*\$\)", 
                    subs => "element  \-directory  \/my_elem\/releases\/\.\.\. \\1\nelement  \/my_elem\/releases\/\.\.\.\/\*\.\[ch\]  \\1" }

                );

In the second regex the variable $1 is defined in the portion (.*\\$) and this is working correctly. The subs expression does not substitute it, however.

 foreach my $line (<INFILE>)
        {
        chomp($line);
        my $test = $line;
        foreach my $hash (@Special_Regex)
        {
            my $regex = qr/$hash->{regex}/is;
            if($test =~ s/$regex/$hash->{subs}/)
                {
                print "$test\n";
                print "$line\n";
                print "$1\n";
                }
         }
}

What am I missing? Thanks in advance.

The substitution string in your regex is only getting evaluated once, which transforms $hash->{subs} into its string. You need to evaluate it again to interpolate its internal variables. You can add the e modifier to the end of the regex which tells Perl to run the substitution through eval which can perform the second interpolation among other things. You can apply multiple e flags to evaluate more than once (if you have a problem that needs it). As tchrist helpfully points out, in this case, you need ee since the first eval will just expand the variable, the second is needed to expand the variables in the expansion.

You can find more detail in perlop about the s operator .

There is no compilation for a replace expression. So about the only thing you can do is exec or eval it with the e flag:

if($test =~ s/$regex/eval qq["$hash->{subs}"]/e ) { #...

worked for me after changing \\\\1 to \\$1 in the replacement strings.

s/$regex/$hash->{subs}/

only replaces the matched part with the literal value stored in $hash->{subs} as the complete substitution. In order to get the substitution working, you have to force Perl to evaluate the string as a string , so that means you even have to add the dquotes back in in order to get the interpolating behavior you are looking for (because they are not part of the string.)

But that's kind of clumsy, so I changed the replace expressions into subs:

my @Special_Regex 
    = ( 
        { regex => qr{\s*element\s+/my_elem_removed\s*/main/\d+$}
        , subs  => sub { '#Line removed' }
        }
    ,   { regex => qr{\s*element\s+/my_elem_changed/releases/\.\.\.\s*(.*$)}
        , subs  => sub { 
            return "element  -directory  /my_elem/releases/... $1\n"
                 . "element  /my_elem/releases/.../*.[ch]  $1"
                 ; 
          }
        }

    );

I got rid of a bunch of stuff that you don't have to escape in a substitution expression. Since what you want to do is interpolate the value of $1 into the replacement string, the subroutine does simply that. And because $1 will be visible until something else is matched, it will be the right value when we run this code.

So now the replacement looks like:

s/$regex/$hash->{subs}->()/e

Of course making it pass $1 makes it a little more bulletproof, because you're not depending on the global $1 :

s/$regex/$hash->{subs}->( $1 )/e

Of course, you would change the sub like so:

subs => sub {
    my $c1 = shift;
    return "element  -directory  /my_elem/releases/... $c1\n"
         . "element  /my_elem/releases/.../*.[ch]  $c1"
         ; 
}

Just one last note: "\\.\\.\\." didn't do what you think it did. You just ended up with '...' in the regex, which matches any three characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM