简体   繁体   中英

Perl: regex for conditional replace?

in this string

ab<(CN)cdXYlm<(CI)efgXYop<(CN)zXYklmn<(CI)efgXYuvw<

I want to replace each substring between XY and < by either ONE or TWO depending on characters between previous brackets:

if XY after (CN) replace substring by ONE

if XY after (CI) replace substring by TWO

So the result should be:

ab<(CN)cdONE<(CI)efgTWO<(CN)zONE<(CI)efgTWO<

XY and following characters should be replaced but not angle bracket < .

This is for modifying HTML and arbitrary characters can occur between XY and < . I guess I need two regex for (CN) and (CI).

# This one replaces just all XY:   
my $s = 'ab<(CN)cdXYlm<(CI)efgXYop<(CN)zXYklmn<(CI)efgXYuvw<';
$s =~ s/(XY(.*?))</ONE/g;    
# But how to add the conditions to the regex?

You don't need two regexes. Capture the C[NI] and retrieve the corresponding replacement value from a hash:

#!/usr/bin/perl
use warnings;
use strict;

my $s = 'ab<(CN)cdXYlm<(CI)efgXYop<(CN)zXYklmn<(CI)efgXYuvw<';

my %replace = (CN => 'ONE', CI => 'TWO');

$s =~ s/(\((C[NI])\).*?)XY.*?</$1$replace{$2}</g;

my $exp = 'ab<(CN)cdONE<(CI)efgTWO<(CN)zONE<(CI)efgTWO<';

use Test::More tests => 1;
is $s, $exp;

My guess is that this expression or maybe a modified version of that might work, not sure though:

([a-z]{2}<\([A-Z]{2}\)[a-z]{2})([^<]+)(<\([A-Z]{2}\)[a-z]{3})([^<]+)(<\([A-Z]{2}\)[a-z])([^<]+)(<\([A-Z]{2}\)[a-z]{3})([^<]+)<

Test

use strict;
use warnings;

my $str = 'ab<(CN)cdXYlm<(CI)efgXYop<(CN)zXYklmn<(CI)efgXYuvw<';
my $regex = qr/([a-z]{2}<\([A-Z]{2}\)[a-z]{2})([^<]+)(<\([A-Z]{2}\)[a-z]{3})([^<]+)(<\([A-Z]{2}\)[a-z])([^<]+)(<\([A-Z]{2}\)[a-z]{3})([^<]+)</mp;
my $subst = '"$1ONE$3TWO$5ONE$7TWO<"';

my $result = $str =~ s/$regex/$subst/rgee;

print $result;

The expression is explained on the top right panel of this demo , if you wish to explore/simplify/modify it, and in this link , you can watch how it would match against some sample inputs step by step, if you like.

This can be done in one line regex using /e and ternary operator ? in the /replace/ . /r option returns the resulting string, in effect this would keep the original string $s unmodified.

use strict;
use warnings;

my $s ='ab<(CN)cdXYlm<(CI)efgXYop<(CN)zXYklmn<(CI)efgXYuvw<';
print (($s=~s/\(([^)]+)\)([^(]+)XY[^(]+</"($1)$2".(($1 eq CN)?ONE:TWO)."<"/gre)."\n");

Output:

ab<(CN)cdONE<(CI)efgTWO<(CN)zONE<(CI)efgTWO<

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM