简体   繁体   中英

perl regex within a regex

There are a lot of times in perl where I want to replace a matched string with itself after another replacement operator is done on the match. For example, I have an application where I need to find quoted strings and remove spaces from them. One way to do this would be:

while($str =~ s/"([^"])+"//){
   $temp = $1;
   $temp2 = $temp;
   $temp =~ s/ /_/g;
   $str =~ s/$temp2/$temp1/;
}

This also seems possible:

$str =~ s/"([^"])+"/replace_spaces($1)/gx;
sub replace_spaces(){
    $word = shift;
    $word =~ s/ /_/g;
    return $word;
}

Is there a pure regex way of doing this, through nesting a regex within a regex somehow?

For the specific task at hand, you'd be better served by using Text::ParseWords :

#!/usr/bin/env perl

use strict; use warnings;
use feature 'say';
use Text::ParseWords;

my $input = q{This is "a t e s t " string. "Hello - world  !"};
my @words = shellwords $input;

for my $word ( @words ) {
    $word =~ s/ +//g;
    say "'$word'";
}

See also How can I split a [character]-delimited string except when inside [character]?

Yes, you can do this but in each situation you need to invent new regular expression. There is no silver bullet in this case.

You must change spaces with underscores but not all of them, only that that are inside quotes delimited substrings. The last condition you check with look ahead and look behind assertions, but these checks are not so easy to formulate.

For example:

$ perl -pe 's/(?<=")(\S+)\s+(?=.*")/$1_/g;'
a b "c d" e f
a b "c_d" e f

But this re is far from perfect. This re works in the easiest situations. It's not a solution it's just a demonstration of the idea.

You could try:

   $str =~ s{"([^"]+)"}{do{(local$_=$1)=~y/ /_/;$_}}eg;

Or, for better readability:

   $str =~ s/
             "([^"]+)"     # all inside double quotes to $1
            / do{          # start a do block
                 local $_ = $1; # get a copy from $1
                 y| |_|;        # transliterate ' ' to '_'
                 $_             # return string from block
                }          # end the do block
            /xeg;

Regards

rbo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM