简体   繁体   中英

Replace strings only within a regex match in perl

I have an XML document with text in attribute values. I can't change how the the XML file is generated, but need to extract the attribute values without loosing \\r\\n. The XML parser of course strips them out.

So I'm trying to replace \\r\\n in attribute values with entity references I'm using perl to do this because of it's non-greedy matching. But I need help getting the replace to happen only within the match. Or I need an easier way to do this :)

Here's is what I have so far:

perl -i -pe 'BEGIN{undef $/;} s/m_description="(.*?)"/m_description="$1"/smg' tmp.xml

This matches what I need to work with: (.*?). But I don't know to expand that pattern to match \\r\\n inside it, and do the replacement in the results. If I knew how many \\r\\n I have I could do it, but it seems I need a variable number of capture groups or something like that? There's a lot to regex I don't understand and it seems like there should be something do do this.

Example:

preceding lines 
stuff m_description="Over
any number
of lines" other stuff
more lines

Should go to:

preceding lines 
stuff m_description="Over
any number
of lines" other stuff
more lines

Solution

Thanks to Ikegam and ysth for the solution I used, which for 5.14+ is:

perl -i -0777 -pe's/m_description="\K(.*?)(?=")/ $1 =~ s!\n!
!gr =~ s!\r!
!gr /sge' tmp.xml

. should already match \\n (because you specify the /s flag) and \\r .

To do the replacement in the results, use /e :

perl -i -0777 -pe's/(?<=m_description=")(.*?)(?=")/ my $replacement=$1; $replacement=~s!\n!&#10;!g; $replacement=~s!\r!&#13;!g; $replacement /sge' tmp.xml

I've also changed it to use lookbehind/lookahead to make the code simpler and to use -0777 to set $/ to slurp mode and to remove the useless /m .

OK, so whilst this looks like an XML problem, it isn't. The XML problem is the person generating it. You should probably give them a prod with a rolled up copy of the spec as your first port of call for "fixing" this.

But failing that - I'd do a two pass approach, where I read the text, find all the 'blobs' that match a description, and then replace them all.

Something like this:

#!/usr/bin/env perl

use strict;
use warnings;

use Data::Dumper;

my $text = do { local $/ ;  <DATA> }; 

#filter text for 'description' text: 
my @matches = $text =~ m{m_description=\"([^\"]+)\"}gms;

print Dumper \@matches; 

#Generate a search-and-replace hash
my %replace = map { $_ => s/[\r\n]+/&#13;&#10;/gr } @matches; 
print Dumper \%replace;

#turn the keys of that hash into a search regex
my $search = join ( "|", keys %replace ); 
   $search = qr/\"($search)\"/ms; 

print "Using search regex: $search\n";
#search and replace text block
$text =~ s/m_description=$search/m_description="$replace{$1}"/mgs;

print "New text:\n";
print $text;

__DATA__
preceding lines 
stuff m_description="Over
any number
of lines" other stuff
more lines

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM